An Adaptive Outlier Detection For Scatter Points Of Unascertained Models

Loading...
Thumbnail Image
Date
2016-02
Authors
Davinna Jeremiah, Davinna Jeremiah
Journal Title
Journal ISSN
Volume Title
Publisher
Universiti Sains Malaysia
Abstract
Outlier detection is the identification of unusual patterns in data. This research presents a new method of detecting outliers found in multivariate scatter data, where outliers are those points that lie far away from the majority of points. One of the challenges in outlier detection is the difficulty of determining the distribution to model a scatter data. This is due to the data’s certain inherent characteristics, for example, its skewness and kurtosis. Owing to these characteristics, it is therefore quite impossible for the right distribution model to be determined without any prior knowledge or user input. This problem aggravates when data are multivariate, where the scatter of data points cannot be visually inspected. Another problem commonly seen in existing techniques is the need of having precise inputs for various parameters. Examples of these parameters are the parameters of a kernel function, the number of clusters to be identified and threshold values. These parameters are known to have much influence on the detection’s final outcome. Thus, if incorrect inputs were given, the final outcome can be very inaccurate. The main objective of this research is to propose an unsupervised method that detects outliers in scatter data where the patterns are such that the distribution model is not easily ascertained. The second objective is to have a method that is adaptive, with the number of parameters reduced and with easy to determine input values. Through having these objectives met, an intelligent way of detection can be better achieved. In the method proposed, data are clustered for the purpose of detecting and eliminating dense clusters which usually do not contain outliers. To identify dense clusters, the density of each cluster is determined through a new method of computation. The dense clusters are then differentiated from the sparse ones through an adaptive technique, with no user input required. All these steps are performed iteratively, till what remains are the sparse clusters which may potentially have outliers. Then, for the true outliers to be finally detected, several point proximity computations are carried out. To verify the detection’s accuracy and efficiency, several evaluations were performed using several scatter data that are differently distributed. The results obtained pertaining to accuracy was especially favourable. Compared to existing methods, the F1 score of the proposed method has shown to be higher at least by 55.6%. The proposed method has also been successfully applied in detecting image anomalies.
Description
Keywords
Outlier detection
Citation