Evaluating the suitability of normalized google similarity and individual match ratio average as measures for protein similari
Loading...
Date
2008-06
Authors
Lee, Jun Choi
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Biological sequence comparIson faces various challenges. Although dynamic
programming solution was claims to be the optimal solution for comparison process,
the computation limitation and some fundamental challenges still make it inefficient
for mass sequence comparison. Statistical method explores the statistics of sequences
by the frequency of the words or partition in the sequence, it not only provides a
solution without loss of statistical information, but also caters some of the
fundamental problems in sequence comparison. Normalized Google Distance is a
way of finding semantic similarity in web pages, with significant related
characteristics. In this study, the suitability of Normalized Google Similarity and
Individual Match Ratio Average in representing statistical significance of proteins in
protein sequence comparison is studied. The potential of the proposed similarity
measurements is evaluated through correlation coefficient and accuracy with FAST A
as the reference benchmark. This study shows that the protein similarity
measurement based on overlapping K-tuple has an overall better result compares to
non-overlapping K-tuple. Both Normalized Google Similarity and Individual Match
Ratio Average shows capability in representing protein sequence comparison.
Description
Keywords
Normalized Google Distance is a way of , finding semantic similarity in web pages