Publication:
Optimized fuzzy c-mean clustering Algorithm with a new cluster Validity index

Loading...
Thumbnail Image
Date
2024-11-01
Authors
Ahmed Khaldoon Abdalameer, Al-Zubaidi
Journal Title
Journal ISSN
Volume Title
Publisher
Research Projects
Organizational Units
Journal Issue
Abstract
The Fuzzy C-mean (FCM) clustering algorithm is widely used for grouping similar data points in one cluster and dissimilar data points in different clusters. The FCM algorithm has one main limitation, which is its sensitivity to initialization of centroids location. Researchers addressed this problem by integrating FCM with nature inspired optimization algorithms, but these integrations have limitations such as getting trapped in local optima, slow convergence rates and difficulty in identifying the optimum number of clusters. To reduce these problems, this research proposes two enhanced FCM-based clustering algorithms and one new cluster validity index (CVI). The first algorithm is known as Fuzzy C-mean Clustering integrated with a Hybrid Artificial Bee Colony (FC-HABC). The HABC algorithm is used to locate better cluster centroids and pass it to the FCM algorithm for final clustering. Using 15 real-world datasets, the FC-HABC algorithm outperformed 6 state-of-of-the-art clustering algorithms in terms of clustering accuracy, purity, F-score and Fredman tests. The new proposed CVI called the Validity Clustering Index based on finding the Mean of Clustered data (VCIM) was designed to find the optimum number of clusters. The basic concept of the proposed VCIM is the use of the mean of the produced clusters to find better centroid locations. Tested with different clustering algorithms and 15 real-world datasets, the proposed VCIM outperformed 6 state-of-the-art CVIs in finding the optimum number of clusters. Finally, the proposed Optimized Fuzzy C-mean Clustering algorithm with a new Validity Index (OFC-HABC) was proposed to address the problems of sensitivity to initialization of centroids location and manual determination of optimum number of clusters. In the OFC-HABC algorithm, the HABC algorithm is used to solve the sensitivity to initialization of centroids location problem by finding better centroid locations, which will be passed to the VCIM to automatically find the optimum number of clusters. The FCM algorithm will then perform the final clustering process. The OFC-HABC algorithm outperformed 6 state-of-the-art clustering algorithms in terms of clustering accuracy, purity, F-score and Fredman tests, when tested using 15 real-world datasets. The good performance produced by the proposed FC-HABC, OFC-HABC, and VCIM show their possibility and capability to be used as data clustering algorithms for real-world applications.
Description
Keywords
Citation