Classification of microarray datasets using random forest

dc.contributor.authorEe Ling, Ng
dc.date.accessioned2015-09-28T07:52:24Z
dc.date.available2015-09-28T07:52:24Z
dc.date.issued2009-06
dc.description.abstractDNA microarray technology has enabled the capability to monitor the expressions of tens of thousands of genes in a biological sample on a single chip. Medical fields can benefit from microarray data mining as it helps in early detection of genes mutation and diagnosis of disease. A well built model can be used to predict unknown disease classes in a test case. Prior to a well built model is to achieve good classification resuits which rely very much on the classifiers that are being us~d. However, in most microarray data, the number of genes usually outnumbers the number of samples. Thus, it is often not just selecting the type of classifier that is essential but also the features looked in selecting significant genes that will contribute to good classification results. Genes selection also varies from study scope and depends on the criteria researchers are looking at. In this study, we propose a stair-line method to select significant genes to reduce the effect of kurtosis found among the genes. Classification is then done using Random Forest. Five microarray datasets with different number of genes and samples are used to demonstrate the effectiveness of this method. This method improves the percentages of correct classification and at the same time reduces the effect of kurtosis in the genes expression values. Other conventional classification schemes are also looked at as a comparison to Random Forest and it is shown that the latter is one classifier that is more superior to the others. In short, Random Forest managed to give a competitive result in classifying genes correctly as Random Forest performed consistently well on all datasets.en_US
dc.identifier.urihttp://hdl.handle.net/123456789/1254
dc.language.isoenen_US
dc.subjectMicroarray datasetsen_US
dc.subjectRandom foresten_US
dc.titleClassification of microarray datasets using random foresten_US
dc.typeThesisen_US
Files
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: