An Adaptive Outlier Detection For Scatter Points Of Unascertained Models
Loading...
Date
2016-02
Authors
Davinna Jeremiah, Davinna Jeremiah
Journal Title
Journal ISSN
Volume Title
Publisher
Universiti Sains Malaysia
Abstract
Outlier detection is the identification of unusual patterns in data. This research
presents a new method of detecting outliers found in multivariate scatter data, where
outliers are those points that lie far away from the majority of points. One of the challenges
in outlier detection is the difficulty of determining the distribution to model a
scatter data. This is due to the data’s certain inherent characteristics, for example, its
skewness and kurtosis. Owing to these characteristics, it is therefore quite impossible
for the right distribution model to be determined without any prior knowledge or
user input. This problem aggravates when data are multivariate, where the scatter of
data points cannot be visually inspected. Another problem commonly seen in existing
techniques is the need of having precise inputs for various parameters. Examples
of these parameters are the parameters of a kernel function, the number of clusters to
be identified and threshold values. These parameters are known to have much influence
on the detection’s final outcome. Thus, if incorrect inputs were given, the final
outcome can be very inaccurate. The main objective of this research is to propose an
unsupervised method that detects outliers in scatter data where the patterns are such
that the distribution model is not easily ascertained. The second objective is to have
a method that is adaptive, with the number of parameters reduced and with easy to
determine input values. Through having these objectives met, an intelligent way of
detection can be better achieved. In the method proposed, data are clustered for the purpose of detecting and eliminating dense clusters which usually do not contain outliers.
To identify dense clusters, the density of each cluster is determined through a
new method of computation. The dense clusters are then differentiated from the sparse
ones through an adaptive technique, with no user input required. All these steps are
performed iteratively, till what remains are the sparse clusters which may potentially
have outliers. Then, for the true outliers to be finally detected, several point proximity
computations are carried out. To verify the detection’s accuracy and efficiency, several
evaluations were performed using several scatter data that are differently distributed.
The results obtained pertaining to accuracy was especially favourable. Compared to
existing methods, the F1 score of the proposed method has shown to be higher at least
by 55.6%. The proposed method has also been successfully applied in detecting image
anomalies.
Description
Keywords
Outlier detection