Feature Selection And Enhanced Krill Herd Algorithm For Text Document Clustering

Date

2018-03

Authors

Abualigah, Laith Mohammad Qasim

Publisher

Universiti Sains Malaysia

Abstract

Text document (TD) clustering is a new trend in text mining in which the TDs are separated into several coherent clusters, where documents in the same cluster are similar. In this study, a new method for solving the TD clustering problem worked in the following two stages: (i) A new feature selection method using particle swarm optimization algorithm with a novel weighting scheme and a detailed dimension reduction technique are proposed to obtain a new subset of more informative features with low-dimensional space. This new subset is used to improve the performance of the text clustering (TC) algorithm in the subsequent stage and reduce its computation time. The k-mean clustering algorithm is used to evaluate the effectiveness of the obtained subsets. (ii) Four krill herd algorithms (KHAs), namely, (a) basic KHA, (b) modified KHA, (c) hybrid KHA, and (d) multi-objective hybrid KHA, are proposed to solve the TC problem; these algorithms are incremental improvements of the preceding versions. For the evaluation process, seven benchmark text datasets are used with different characterizations and complexities. Results show that the proposed methods and algorithms obtained the best results in comparison with the other comparative methods published in the literature.

Keywords

Feature selection and enhanced krill herd algorithm , for text document clustering

URI

http://hdl.handle.net/123456789/7693

Collections

Pusat Pengajian Sains Komputer - Tesis

Full item page