Gpu Based Fast Phylogenetic Tree Construction Algorithm With Reduce Dataset

Loading...
Thumbnail Image
Date
2016-09
Authors
Ibrahim, Najihah
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The tremendous growth of new genomic data, the enhancement and the fusion of genomic data analysis methods and the manipulation of the technological innovations designed for high performance computing had become the main interest of this research. Genomic data analysis; sequence analysis is used to analyse and manipulating the homologous genomic data and phylogenetic tree is one of the method in sequence analysis to construct the evolutionary relationship between the genomic data. However, the construction of a phylogenetic tree required an initial process that is sequence alignment process. This researched had proved that the input genomic dataset must be aligned before the phylogenetic tree construction process took place. Sequence alignment is a process to align the genomic data in finding the similar regions. This is an important process because the raw homologous genomic dataset usually are not standardized and consist of unknown characters. Nowadays, there are large numbers of sequence alignment’s programs that available to be employed. Hence, the selection of an ideal program to align the dataset becomes more difficult. Preliminary experiments conducted had proved that best program to align the sequences dataset is MAFFT compared to ClustalW, Kalign, MUSCLE and T-Coffee. The result of sequence alignment is an aligned dataset. The aligned dataset was used as the input dataset for constructing a phylogenetic tree. There are a lot of programs available with various kinds of methods to construct a phylogenetic tree. A comparative study was conducted to compare the methods from a few notable phylogenetic tree construction programs; GARLI, MrBayes, Tree Puzzle and FastTree. Evaluation had shown that FastTree appeared as a program that has many robust methods to construct a phylogenetic tree such as neighbor-joining method and profile-based method for the arrangement of nodes and taxas position of the tree. Through the experiments to construct a phylogenetic tree, we found that, aligned sequences selection also able to affect the phylogenetic tree construction process and result. Hence, a method was introduced to increase the quality of the aligned dataset; Half-parsimonious. Half-parsimonious method was able to reduce the size of the dataset while keeping the informative sites. This method was able to increase the maximum likelihood score and the branch length of the phylogenetic tree while decreasing the processing time for the construction process. The informative aligned dataset then will be used as the input data for the integration of phylogenetic tree construction’s methods. Our experiments shows that the new integration methods able to increase the maximum likelihood scores and the branch length of the phylogenetic tree. However, the processing time of this new integration had increase due to the exhaustive search algorithm implemented in the construction process . Hence, an acceleration method was implemented by using the many-core processors; Graphic Processing Unit (GPU). The processing time for the accelerated phylogenetic tree construction process was reduced almost 94% from the original process while maintaining the accuracy of the maximum likelihood score and the branch length. This research had constructed an accurate phylogenetic tree with a good branch length and lower processing time for the phylogenetic tree construction process.
Description
Keywords
The input genomic dataset must be aligned before , the phylogenetic tree construction process took place.
Citation