Evaluating the suitability of normalized google similarity and individual match ratio average as measures for protein similari

Loading...
Thumbnail Image
Date
2008-06
Authors
Lee, Jun Choi
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Biological sequence comparIson faces various challenges. Although dynamic programming solution was claims to be the optimal solution for comparison process, the computation limitation and some fundamental challenges still make it inefficient for mass sequence comparison. Statistical method explores the statistics of sequences by the frequency of the words or partition in the sequence, it not only provides a solution without loss of statistical information, but also caters some of the fundamental problems in sequence comparison. Normalized Google Distance is a way of finding semantic similarity in web pages, with significant related characteristics. In this study, the suitability of Normalized Google Similarity and Individual Match Ratio Average in representing statistical significance of proteins in protein sequence comparison is studied. The potential of the proposed similarity measurements is evaluated through correlation coefficient and accuracy with FAST A as the reference benchmark. This study shows that the protein similarity measurement based on overlapping K-tuple has an overall better result compares to non-overlapping K-tuple. Both Normalized Google Similarity and Individual Match Ratio Average shows capability in representing protein sequence comparison.
Description
Keywords
Normalized Google Distance is a way of , finding semantic similarity in web pages
Citation