Publication: Optimized fuzzy c-mean clustering Algorithm with a new cluster Validity index
Loading...
Date
2024-11-01
Authors
Ahmed Khaldoon Abdalameer, Al-Zubaidi
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The Fuzzy C-mean (FCM) clustering algorithm is widely used for grouping
similar data points in one cluster and dissimilar data points in different clusters. The
FCM algorithm has one main limitation, which is its sensitivity to initialization of
centroids location. Researchers addressed this problem by integrating FCM with nature inspired optimization algorithms, but these integrations have limitations such as getting
trapped in local optima, slow convergence rates and difficulty in identifying the
optimum number of clusters. To reduce these problems, this research proposes two
enhanced FCM-based clustering algorithms and one new cluster validity index (CVI).
The first algorithm is known as Fuzzy C-mean Clustering integrated with a Hybrid
Artificial Bee Colony (FC-HABC). The HABC algorithm is used to locate better cluster
centroids and pass it to the FCM algorithm for final clustering. Using 15 real-world
datasets, the FC-HABC algorithm outperformed 6 state-of-of-the-art clustering
algorithms in terms of clustering accuracy, purity, F-score and Fredman tests. The new
proposed CVI called the Validity Clustering Index based on finding the Mean of
Clustered data (VCIM) was designed to find the optimum number of clusters. The basic
concept of the proposed VCIM is the use of the mean of the produced clusters to find
better centroid locations. Tested with different clustering algorithms and 15 real-world
datasets, the proposed VCIM outperformed 6 state-of-the-art CVIs in finding the
optimum number of clusters. Finally, the proposed Optimized Fuzzy C-mean Clustering
algorithm with a new Validity Index (OFC-HABC) was proposed to address the
problems of sensitivity to initialization of centroids location and manual determination
of optimum number of clusters. In the OFC-HABC algorithm, the HABC algorithm is used to solve the sensitivity to initialization of centroids location problem by finding
better centroid locations, which will be passed to the VCIM to automatically find the
optimum number of clusters. The FCM algorithm will then perform the final clustering
process. The OFC-HABC algorithm outperformed 6 state-of-the-art clustering
algorithms in terms of clustering accuracy, purity, F-score and Fredman tests, when
tested using 15 real-world datasets. The good performance produced by the proposed
FC-HABC, OFC-HABC, and VCIM show their possibility and capability to be used as
data clustering algorithms for real-world applications.