Parallelization of maximum clique algorithm for protein sequence clustering

Loading...
Thumbnail Image
Date
2008-06
Authors
Al-Qudami, Nasser Ali Mohammed
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Clustering algorithms are widely used in bioinformatics to classify biological data. Graph-based clustering is one of the methods used in protein sequences clustering. This method transforms the clustering process into a graph problem via representing the sequences as vertices and their similarities as weighted edges. In this method, a sub-graph is defined as a core cluster. The clique algorithm has been employed in the clustering method to find a maximum sub-graph (or clique) in which each vertex is linked by an edge. However, applying this clustering method on large protein databases leads to the generation of large graphs in which finding a sub-graph ( clique) of maximum size became quite difficult and time consuming. This work focuses on parallelizing a maximum clique algorithm, which uses a branch-andbound technique to find a clique in a graph, in order to enhance clique algorithm performance and simultaneously conferring a positive impact on overall clustering performance. The algorithm has been implemented using OpenMP and executed in a multi-core computer which supports the multi-threads programming. A 7.9 speedup obtained by executing the algorithm with eight threads. The main benefit of this study was its successful reduction of a 71 minute problem (or dataset consisting of 1800 protein sequences) into a 9 minute problem.
Description
Keywords
Graph-based clustering is one of the methods , used in protein sequences clustering.
Citation