Feature Selection And Enhanced Krill Herd Algorithm For Text Document Clustering
Loading...
Date
2018-03
Authors
Abualigah, Laith Mohammad Qasim
Journal Title
Journal ISSN
Volume Title
Publisher
Universiti Sains Malaysia
Abstract
Text document (TD) clustering is a new trend in text mining in which the TDs
are separated into several coherent clusters, where documents in the same cluster are
similar. In this study, a new method for solving the TD clustering problem worked
in the following two stages: (i) A new feature selection method using particle swarm
optimization algorithm with a novel weighting scheme and a detailed dimension reduction
technique are proposed to obtain a new subset of more informative features
with low-dimensional space. This new subset is used to improve the performance of
the text clustering (TC) algorithm in the subsequent stage and reduce its computation
time. The k-mean clustering algorithm is used to evaluate the effectiveness of the obtained
subsets. (ii) Four krill herd algorithms (KHAs), namely, (a) basic KHA, (b)
modified KHA, (c) hybrid KHA, and (d) multi-objective hybrid KHA, are proposed to
solve the TC problem; these algorithms are incremental improvements of the preceding
versions. For the evaluation process, seven benchmark text datasets are used with different
characterizations and complexities. Results show that the proposed methods and
algorithms obtained the best results in comparison with the other comparative methods
published in the literature.
Description
Keywords
Feature selection and enhanced krill herd algorithm , for text document clustering