Publication: Extended nearest centroid neighbor method with training set reduction for classification
Loading...
Date
2020-06-01
Authors
Mukahar, Nordiana
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The k - Nearest Centroid Neighbor (kNCN) is a well-known non-parametric classifier that shows remarkable performance in classification. Nevertheless, this technique suffers from slow classification time and one-sided selection of nearest centroid neighbors which leads to the poor performance of classification accuracy. This thesis first presents four variants of the training data set reduction techniques termed Reduced Set k - Nearest Centroid Neighbor.v1 (RSkNCN.v1), Reduced Set k - Nearest Centroid Neighbor.v2 (RSkNCN.v2), Reduced Set k - Nearest Centroid Neighbor.v3 (RSkNCN.v3) and Reduced Set k - Nearest Centroid Neighbor.v4 (RSkNCN.v4) to reduce the classification time of the kNCN. Atypical samples are removed first by using Wilson’s Edited kNCN and the fraction of training set is computed by using the maximum or optimum rank of training samples (that agrees with the majority of its k - nearest centroid neighbors). Experimental results carried out with 30 sets of the Real-world data from UCI Repository and FV-USM image database show that the proposed training set reduction techniques obtain the best performance in terms of reduction ratio and classification time compared to the benchmark techniques (Wilson’s Edited, Iterative and Limited-kNCNs). All the proposed techniques give satisfying results in terms of classification accuracy except for the RSkNCN.v4 that shows a poor result. This technique performs such aggressive training samples removal strategy and there is a possibility that the training samples with useful information might be removed leading to poor classification performance accuracy. Regarding the second problem of the kNCN, this thesis proposes a new Reduced Set Extended k - Nearest Centroid Neighbor (RSENCN) classifier to improve the classification accuracy of the kNCN classifier. The proposed RSENCN classifier captures more class information by considering nearest centroid neighbors from the views of training and test samples. Experimental comparisons and statistical significant analysis have confirmed that the proposed RSENCN classifier outperforms the other benchmark classifiers (kNCN, DWkNCN, kNN, DWkNN, FkNN, ENN, MkNN, kGNN) by yielding the highest classification accuracy of 88.56%, 89.20% and 83.90% on 30 sets of the Real-world data, I-4I data set and FV-USM image database. In conclusion, the findings reveal that a small subset that produces a high reduction ratio and fast classification time can be obtained by using the maximum or optimum rank of training samples (that agrees with the majority of its k - nearest centroid neighbors). The findings also reveal that the factors of spatial distribution and two-sided consideration of nearest centroid neighbors result in consistent improvement of classification performance of the RSENCN classifier.