Artificial Bee Colony With Differential Evolution Algorithm For Feature Extraction And Selection Of Mass Spectrometry Data

Date

2016-05

Authors

Mohamed Yusoff, Syarifah Adilah

Abstract

The advancement in mass spectrometry technique for proteomic studies has proliferated the discovery of biomarkers from quantitative proteomics pattern. Highthroughput data for a given molecule can give rise to a series of inter-related and overlapping peaks in a mass spectrum. The spectrum suffers from high dimensionality data relative to small sample size. Several studies have proposed statistical and machine learning techniques such as Principle Component Analysis (PCA), Independent Component Analysis (ICA) and wavelet-coefficient in order to extract the potential features. However, none of these methods take into account the huge number of features relative to small sample size. This study focused on two stages of mass spectrometry analysis. Firstly, feature extraction methods extract peaks as potential features to infer biological meaning of the data. Shrinkage estimation of covariance was proposed to assemble m=z windows and identify the correlation coefficient among peaks of mass spectrometry data for feature extraction. Secondly, feature selection techniques search parsimonious features through a learning model that exhibits the most accurate results. A computational technique that mimics survival and natural processing known as Artificial Bee Colony (ABC) integrated with linear SVM classifier was proposed for feature selection. Later, this was hybrid with Differential Evolution (DE) techniques (deABC) algorithm in order to expand the exploration of basic ABC. The proposed method was tested with several real-world high resolution mass spectrometry datasets which are ovarian cancer, liver (HCC) and Drug-induced toxicity (TOX) datasets to evaluate the discrimination power, accuracy, sensitivity and specificity. For feature extraction, the analysis was made with reported studies. The shrinkage estimation has performed better discriminative analysis on the similar features. For feature selection, the comparisons have been made with Particle Swarm Optimisation (PSO), Ant Colony Optimisation (ACO) algorithms and reported studies. The proposed feature selection deABC algorithm exhibited accuracy of 98.44, 88.89 and 93.75 percent on ovarian cancer, TOX and liver (HCC) datasets respectively and in average outperformed the PSO, ACO and similar reported study.

Keywords

Feature extraction methods extract peaks as potential features , to infer biological meaning of the data.

URI

http://hdl.handle.net/123456789/3391

Collections

Pusat Pengajian Sains Komputer - Tesis

Full item page