Parallelization of maximum clique algorithm for protein sequence clustering
Loading...
Date
2008-06
Authors
Al-Qudami, Nasser Ali Mohammed
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Clustering algorithms are widely used in bioinformatics to classify biological
data. Graph-based clustering is one of the methods used in protein sequences
clustering. This method transforms the clustering process into a graph problem via
representing the sequences as vertices and their similarities as weighted edges. In this
method, a sub-graph is defined as a core cluster. The clique algorithm has been
employed in the clustering method to find a maximum sub-graph (or clique) in which
each vertex is linked by an edge. However, applying this clustering method on large
protein databases leads to the generation of large graphs in which finding a sub-graph
( clique) of maximum size became quite difficult and time consuming. This work
focuses on parallelizing a maximum clique algorithm, which uses a branch-andbound
technique to find a clique in a graph, in order to enhance clique algorithm
performance and simultaneously conferring a positive impact on overall clustering
performance. The algorithm has been implemented using OpenMP and executed in a
multi-core computer which supports the multi-threads programming. A 7.9 speedup
obtained by executing the algorithm with eight threads. The main benefit of this
study was its successful reduction of a 71 minute problem (or dataset consisting of
1800 protein sequences) into a 9 minute problem.
Description
Keywords
Graph-based clustering is one of the methods , used in protein sequences clustering.