Feature Selection And Enhanced Krill Herd Algorithm For Text Document Clustering

Loading...
Thumbnail Image
Date
2018-03
Authors
Abualigah, Laith Mohammad Qasim
Journal Title
Journal ISSN
Volume Title
Publisher
Universiti Sains Malaysia
Abstract
Text document (TD) clustering is a new trend in text mining in which the TDs are separated into several coherent clusters, where documents in the same cluster are similar. In this study, a new method for solving the TD clustering problem worked in the following two stages: (i) A new feature selection method using particle swarm optimization algorithm with a novel weighting scheme and a detailed dimension reduction technique are proposed to obtain a new subset of more informative features with low-dimensional space. This new subset is used to improve the performance of the text clustering (TC) algorithm in the subsequent stage and reduce its computation time. The k-mean clustering algorithm is used to evaluate the effectiveness of the obtained subsets. (ii) Four krill herd algorithms (KHAs), namely, (a) basic KHA, (b) modified KHA, (c) hybrid KHA, and (d) multi-objective hybrid KHA, are proposed to solve the TC problem; these algorithms are incremental improvements of the preceding versions. For the evaluation process, seven benchmark text datasets are used with different characterizations and complexities. Results show that the proposed methods and algorithms obtained the best results in comparison with the other comparative methods published in the literature.
Description
Keywords
Feature selection and enhanced krill herd algorithm , for text document clustering
Citation