Classification of microarray datasets using random forest

Classification of microarray datasets using random forest

Simple item page

dc.contributor.author	Ee Ling, Ng
dc.date.accessioned	2015-09-28T07:52:24Z
dc.date.available	2015-09-28T07:52:24Z
dc.date.issued	2009-06
dc.description.abstract	DNA microarray technology has enabled the capability to monitor the expressions of tens of thousands of genes in a biological sample on a single chip. Medical fields can benefit from microarray data mining as it helps in early detection of genes mutation and diagnosis of disease. A well built model can be used to predict unknown disease classes in a test case. Prior to a well built model is to achieve good classification resuits which rely very much on the classifiers that are being us~d. However, in most microarray data, the number of genes usually outnumbers the number of samples. Thus, it is often not just selecting the type of classifier that is essential but also the features looked in selecting significant genes that will contribute to good classification results. Genes selection also varies from study scope and depends on the criteria researchers are looking at. In this study, we propose a stair-line method to select significant genes to reduce the effect of kurtosis found among the genes. Classification is then done using Random Forest. Five microarray datasets with different number of genes and samples are used to demonstrate the effectiveness of this method. This method improves the percentages of correct classification and at the same time reduces the effect of kurtosis in the genes expression values. Other conventional classification schemes are also looked at as a comparison to Random Forest and it is shown that the latter is one classifier that is more superior to the others. In short, Random Forest managed to give a competitive result in classifying genes correctly as Random Forest performed consistently well on all datasets.	en_US
dc.identifier.uri	http://hdl.handle.net/123456789/1254
dc.language.iso	en	en_US
dc.subject	Microarray datasets	en_US
dc.subject	Random forest	en_US
dc.title	Classification of microarray datasets using random forest	en_US
dc.type	Thesis	en_US

Files

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Collections

Pusat Pengajian Sains Matematik - Tesis