Far East Journal of Theoretical Statistics
Volume 13, Issue 2, Pages 215 - 232
(July 2004)
|
|
USING A COMPRESSIBILITY MEASURE TO DISTINGUISH CODING AND NONCODING DNA
Armin Shmilovici (Israel) and Irad Ben-Gal (Israel)
|
Abstract: DNA sequences consist of protein coding and noncoding regions. Recognition of coding regions is an important phase in gene-finding procedures. This paper presents a new method for distinguishing coding and noncoding DNA regions.
The proposed method implements compressibility measures that result from Variable Order Markov (VOM) models. In contrast to fixed-order Markov models, where the model order is identical for all positions and for all contexts, in VOM models the order may vary – based on a nucleotide position and its contexts. As a result, VOM models are more flexible with respect to model parameterization.Preliminary experimental results on benchmark data-sets demonstrate that the proposed methodology classifies coding and noncoding DNA more accurately than traditional coding measures presented in the literature. |
Keywords and phrases: DNA compression, variable order Markov model, coding and noncoding DNA, context-tree. |
|
Number of Downloads: 265 | Number of Views: 859 |
|