Keywords and phrases: virus RNAs, DNA walks, metric-based binary walk algorithm, ATG walk, SARS-CoV-2 virus, MERS-CoV virus, Dengue virus, Ebola virus.
Received: November 6, 2023; Revised: December 22, 2023; Accepted: February 16, 2024; Published: April 20, 2024
How to cite this article: A. Belinsky and G. A. Kouzaev, DNA walks in virus genomics, JP Journal of Biostatistics 24(2) (2024), 251-286. http://dx.doi.org/10.17654/0973514324017
This Open Access Article is Licensed under Creative Commons Attribution 4.0 International License
References: [1] H. Fletcher and I. Hickey, Genetics, 4th ed., Garland Science, 2013. [2] G. Meister, RNA Biology: An Introduction, Wiley-VCH, 2011. [3] C. Nello and M. Hahn, Introduction to Computational Genomics: A Case Studies Approach, University Press Cambridge, 2012. [4] A. Pinho, S. Garcia, D. Pratas and P. J. S. G. Ferreira, DNA sequences at a glance, Plos One 8 (2013), e79922(1-11). [5] GenBank® [www.ncbi.nlm.nih.gov/genbank]. [6] Global Initiative on Sharing All Influenza Data (GISAID) [www.gisaid.org]. [7] J. Blayney et al., Super-enhancers include classical enhancers and facilitators to fully activate gene expression, Cell 186 (2023), 5826-5839. [8] W. Li, T. Marr T and K. Kaneko, Understanding long-range correlations in DNA sequences, Phys. D 75 (1994), 392-416. [9] G. Villani, Affinity and correlation in DNA, Multidisciplinary Sci. J. 5 (2022), 214-231. [10] J. Berger, S. Mitra, M. Carli and A. Neri, Visualization and analysis of DNA sequences using DNA walks, J. Franklin Inst. 341 (2004), 37-53. [11] M. Tibatan and M. Sarısaman, Unitary structure of palindromes in DNA, Biosystems 211 (2022), 104565(1-8). [12] P. Vaidyanathan, Genomics and proteomics: a signal processor’s tour, IEEE Circ. Syst. Mag. 4 (2004), 7-29. [13] J. Lorenzo-Ginori, A. Rodríguez-Fuentes, R. Ábalo and R. S. Rodríguez, Digital signal processing in the analysis of genomic sequences, Current Bioinformatics 4 (2009), 28-40. [14] A. Belinsky and G. Kouzaev, Visual and quantitative analyses of virus genomic sequences using a metric-based algorithm, WSEAS Trans. Circ. Syst. 21 (2022), 323-348. [15] A. Belinsky and G. Kouzaev, Geometrical study of virus RNA sequences, BioRxiv preprint: 2021.09.06.459135. https://doi.org/10.1101/2021.09.06.459135; Europe PMC: PPR: PPR391263. [16] G. Kouzaev, The geometry of ATG-walks of the Omicron SARS-CoV-2 virus RNAs, BioRxiv preprint: https://doi.org/10.1101/2021.12.20.473613; Europe PMC: PPR: PPR435860. [17] H. Kwan and S. Arniker, Numerical representation of DNA sequences, Proc. 2009 IEEE Int. Conf., Electro/Information Technology, Windsor, ON, Canada, 2009, pp. 307-310. [18] C. Cattani, Complex representation of DNA sequences, M. Elloumi et al., eds., Bioinformatics Research and Development, BIRD 2008, Communications in Computer and Information Science, Vol. 13, Springer, 2008, pp. 528-537. [19] E. Hamori and J. Raskin, Curves, a novel method of representation of nucleotide series especially suited for long DNA sequences, J. Biol. Chem. 258 (1983), 1318-1327. [20] M. Gates, Simpler DNA sequence representations, Nature 316 (1985), 219. [21] R. Voss, Evolution of long-range fractal correlations and noise in DNA sequences, Phys. Rev. Lett. 68 (1992), 3805-3808. [22] A. Nandy, A new graphical representation and analysis of DNA sequence structure, I. Methodology and applications to globin genes, Curr. Sci. 66 (1994), 309-314. [23] A. Nandy, Two-dimensional graphical representation of DNA sequences and intron-exon discrimination in intron-rich sequences, Cabios 12 (1996), 55-62. [24] P. Leong and S. Morgenthaaler, Random walk and gap plots of DNA sequences, Comput. Appl. Biosci. 11 (1995), 503-507. [25] B. Hewelt et al., The DNA walk and its demonstration of deterministic chaos-relevance to genomic alterations in lung cancer, Bioinformatics 35 (2019), 2738-2748. [26] A. Nandy et al., Characterizing the Zika virus genome - a bioinformatics study, Curr. Comp. Aided Drug Design 12 (2016), 87-97. [27] S. S. T. Yau et al., DNA sequence representation without degeneracy, Nucleic Acids Res. 31 (2003), 3078-3080. [28] C. Yu, M. Deng and S. S. T. Yau, DNA sequence comparison by a novel probabilistic method, Inform. Sci. 181 (2011), 1484-1492. [29] T. Cover and J. Thomas, Elements of Information Theory, J. Wiley and Sons, 1991. [30] J. Berger et al., New approaches to genome sequence analysis based on digital signal processing, Proc. Workshop on Genomic Signal Processing and Statistics (GENSIPS), IEEE, Raleigh, North Carolina, USA, 11-13 Oct. 2002, CP2-08. 2002, pp. 1-4. [31] P. Cristea, Conversion of nucleotide sequences into genomic signals, J. Cell. Mol. Med. 6 (2002), 279-303. [32] L. Das, S. Nanda and J. Das, An integrated approach for identification of exon locations using recursive Gauss Newton tuned Kaiser window, Genomics 111 (2019), 284-296. [33] A. Brodzik and O. Peters, Symbol-balanced quaternion periodicity transform for latent pattern detection in DNA sequences, Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP’05), 2005, Philadelphia, PA, USA, 2005, Vol. 5, pp. v/373-v/376. [34] Z. J. Zang, DV-curve: a novel intuitive tool for visualizing and analyzing DNA sequences, Bioinformatics 25 (2009), 1112-1117. [35] A. Nandy, M. Harle and S. Basak, Mathematical descriptors of DNA sequences: development and applications, Arkivoc (2006), 211-238. [36] H. Kwan and S. Arniker, Numerical representation of DNA sequences, Proc. 2009 IEEE Int. Conf. Electro/Inf. Technol., Windsor, ON, Canada, 2009, pp. 307-310. [37] M. Randić, M. Novič and D. Plavšić, Milestones in graphical bioinformatics, Int. J. Quantum Chem. 113 (2013), 2413-2446. [38] V. Aram, A. Iranmanesh and Z. Majid, Spider representations of DNA sequences, J. Comput. Theor. Nanoscience 11 (2014), 418-420. [39] Y. Li, Q. Liu and X. Zheng, DUC-curve, a highly compact 2D graphical representation of DNA sequences and its application in sequence alignment, Phys. A 456 (2016), 256-270. [40] Z. Mo et al., One novel representation of DNA sequence based on the global and local position information, Sci. Rep. 8 (2018), 7592(1-7). [41] G. S. Xie et al., Graphical representations and similarity analysis of DNA sequences based on trigonometric functions, Acta Biotheor. 66 (2018), 113-133. [42] B. Lee, Squiggle: a user-friendly two-dimensional DNA sequence visualization tool, Bioinformatics 35 (2018), 1425-1426. [43] J. Moroz and P. Nelson, Torsional directed walks, entropic elasticity, and DNA twist stiffness, Proc. Natl. Acad. Sci. USA 94 (1997), 14418-14422. [44] M. Randić, 2-D graphical representation of proteins based on physico-chemical properties of amino acids, Chem. Phys. Lett. 476 (2009), 281-286. [45] M. Mahmoodi-Reihani, F. Abbasitabar and V. Zare-Shahabadi, A novel graphical representation and similarity analysis of protein sequences based on physicochemical properties, Phys. A 510 (2018), 477-485. [46] N. Marascio et al., Molecular characterization and cluster analysis of SARS-CoV-2 viral isolates in Kahramanmaras city, Turkey: The Delta VOC wave within one month, Viruses 15 (2023), 802(1-12). [47] C. Peng et al., Long-range correlations in nucleotide sequences, Nature 356 (1992), 168-170. [48] A. Grigoriev, Analyzing genomes with cumulative skew diagrams, Nucleic Acids Res. 26 (1998), 2286-2290. [49] X. Q. Qi, J. Wen and Z. H. Qi, New 3D graphical representation of DNA sequence based on dual nucleotides, J. Theor. Biol. 249 (2007), 681-690. [50] C. Li et al., Novel graphical representation and numerical characterization of DNA sequences, Appl. Sci. 6 (2016), 63(1-15). [51] F. Bai et al., Vector representation and its application of DNA sequences based on nucleotide triplet codons, J. Mol. Graph Model. 62 (2015), 150-156. [52] D. Bielińska-Wąż et al., 2D-dynamic representation of DNA sequences, Chem. Phys. Lett. 442 (2007), 140-144. [53] A. Nandy et al., Characteristics of influenza HA-NA interdependence determined through a graphical technique, Curr. Comuter. Aided Drug Design 10 (2014), 285-302. [54] A. Nandy and S. Basak, Prognosis of possible reassortments in recent H5N2 epidemic influenza in USA: implication for computer-assisted surveillance as well as drug/vaccine design, Curr. Comput. Aided Drug Design 11 (2015), 110-116. [55] D. Panas et al., 2D-dynamic representation of DNA/RNA sequences as a characterization tool of the Zika virus genome, MATCH Commun. Math. Comput. Chem. 77 (2017), 321-332. [56] D. Panas et al., An application of the 2D-dynamic representation of DNA/RNA sequences to the prediction of influenza a virus subtypes, MATCH Commun. Math. Comput. Chem. 80 (2018), 295-310. [57] P. Wąż and D. Bielińska-Wąż, 3D-dynamic representation of DNA sequences, J. Mol. Model. 20 (2014), 2141(1-7). [58] D. Bielińska-Wąż, P. Wąż and D. Panas, Applications of 2D and 3D-dynamic representations of DNA/RNA sequences for a description of genome sequences of viruses, Comb. Chem. High Throughput Screening 25 (2022), 429-438. [59] P. Wąż and D. Bielińska-Wąż, Non-standard bioinformatics characterization of SARS-CoV-2, Comp. Biol. Med. 131 (2021), 104247(1-14). [60] D. Bielińska-Wąż et al., 4D-dynamic representation of DNA/RNA sequences: studies on genetic diversity of Echinococcus multilocularis in red foxes in Poland, Life 12 (2022), 877(1-23). [61] A. Czernieka et al., 20D-dynamic representations of protein sequences, Genomics 107 (2016), 16-23. [62] A. Kostadinov and G. Kouzaev, A novel processor for artificial intelligence acceleration, WSEAS Trans. Circ. Systems 21 (2022), 125-141. [63] B. Brejová, T. Vinar and M. Li, Pattern discovery, Introduction to Bioinformatics, S. Krawetz and D. Womble, eds., Humana Press, 2003, pp. 491-522. [64] R. Mian, M. Shintani and M. Inoue, Hardware-software co-design for decimal multiplication, Computers 10 (2021), 17(1-19). [65] N. Brisebarre et al., Comparison between binary and decimal floating-point numbers, IEEE Trans. Comput. 65 (2016), 2032-2044. [66] Matlab® R2020b, version 9.9.0.1477703. [https://se.mathworks.com/products/matlab.html] [67] Chapter 2. General Structure, The Unicode Standard (6.0 ed.), The Unicode Consortium: Mountain View, California, US. [68] R. Hamming, Error detecting and error-correcting codes, Bell. Syst. Techn. J. 29 (1950), 147-160. [69] W. Waggener, Pulse Code Modulation Techniques, Springer-Verlag, 1995. [70] G. Navarro and M. Raffinot, Flexible Pattern Matching in Strings: Practical Online Search Algorithms for Texts and Biological Sequences, Cambridge University Press, 2002. [71] V. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals, Soviet Phys. Doklady 10 (1966), 707-710. [72] E. Gabidullin, Theory of codes with maximum rank distance, Problemy Peredachi Informatsii (Probl. Inform. Trans.) 21 (1985), 3-16. [73] E. Polityko, Calculation of distance between strings https://www.mathworks.com/matlabcentral/fileexchange/17585-calculation-of-distance-between-strings] MATLAB Central File Exchange, Retrieved March 3, 2021. [74] X. Yang et al., Genetic cluster analysis of SARS-CoV-2 and the identification of those responsible for the major outbreaks in various countries, Emerging Microbes and Infect. 9 (2020), 1287-1299. [75] J. Tzeng, H. H. S. Lu and W. H. Li, Multi-dimensional scaling for large genomic data sets, BMC Bioinformatics 9 (2008), 179(1-17). [76] A. Taghavi et al., Evaluating geometric definitions of staking for RNA dinucleoside monophosphates using molecular mechanics calculations, J. Chem. Theory Comput. 18 (2022), 3637-3653. [77] A. Melkich and A. Khrennikov, Nontrivial quantum and quantum-like effects in biosystems: Unsolved questions and paradoxes, Progress Biophys. Mol. Biol. 119 (2015), 137-161. [78] J. Feder, Fractals, Plenum Press, 1988. [79] C. Berthelsen, J. Glazier and M. Skolnick, Global fractal dimension of human DNA sequences treated as pseudorandom walks, Phys. Rev. A 45 (1992), Paper No 89028913. [80] P. Licinio and R. Caligiorne, Inference of phylogenetic distances from DNA-walk divergences, Phys. A 341 (2004), 471-481. [81] A. Rosas, E. Nogueira Jr. and J. Fontanari, Multifractal analysis of DNA walks and trails, Phys. Rev. E 66 (2002), 061906(1-6). [82] A. Haimovich et al., Wavelet analysis of DNA walks, J. Comput. Biol. 13 (2006), 1289-1298. [83] H. Namazi et al., Diagnosis of skin cancer by correlation and complexity analyses of damaged DNA, Oncotarget 6 (2015), 42623-42631. [84] G. Abramson, H. Cerdeira and C. Bruschi, Fractal properties of DNA walks, Biosystems 49 (1999), 63-70. [85] C. Cattani, Fractals and hidden symmetries in DNA, Math. Probl. Eng. 2010 (2010), 507056(1-31). [86] S. Ouadfeul, Multifractal analysis of SARS-CoV-2 coronavirus genomes using the wavelet transforms, BioRxiv preprint: https://doi.org/10.1101/2020.08.15.252411. [87] B. Hao, H. C. Lee and S. Zhang, Fractals related to long DNA sequences and complete genomes, Chaos Solitons Fractals 11 (2000), 825-836. [88] Z. Y. Su, T. Wu and S. Y. Wang, Local scaling and multifractality spectrum analysis of DNA sequences - GenBank data analysis, Chaos Solitons Fractals 40 (2009), 1750-1765. [89] G. Durán-Meza, J. López-García and J. del Río-Correa, The self-similarity properties and multifractal analysis of DNA sequences, Appl. Math. Nonlin. Sci. 4 (2019), 267-278. [90] M. Swapna and S. Sankararaman, Fractal applications in bio-nanosystems, Bioequiv. Availab. 2 (2019), pp. OABB.000541(1-4). [91] X. Bin, E. Sargent and S. Kelley, Nanostructuring of sensors determines the efficiency of biomolecular capture, Anal. Chem. 82 (2010), 5928-5931. [92] J. Chen et al., Research progress of DNA walker and its recent applications in biosensor, TrAC Trends in Anal. Chem. 120 (2019), 115626(1-14). [93] A. Sadana, Engineering Biosensors, Kinetics and Design Application, Acad. Press, 2001. [94] P. Grassberger and I. Procaccia, Measuring the strangeness of strange attractors, Phys. D 9 (1983), 189-208. [95] S. Rasband, Chaotic Dynamics of Nonlinear Systems, Dover Publications, 2015. [96] B. Henry, N. Lovell and F. Camacho, Nonlinear dynamics time series analyses, Nonlinear Biomedical Signal Processing: Dynamic Analysis and Modeling, M. Akay, ed., IEEE Press, 2000, pp. 1-39. [97] F. Roueff and J. Véhel, A regularization approach to fractional dimension estimation, Proc. Int. Conf. Fractals 98, Oct. 1998, Valletta, Malta. World Sci., 1998, pp. 1-14. [98] J. Véhel and P. Legrand, Signal and image processing with Fraclab, Thinking in Patterns, World Sci. (2003), 321-322. [99] G. Kouzaev, Application of Advanced Electromagnetics, Components and Systems, Springer-Verlag, 2013. [100] C. Guidolin et al., Does a self-similarity logic shape the organization of the nervous system? The Fractal Geometry of the Brain, A. Di Leva, ed., Springer- Verlag, 2016, pp. 138-156. [101] FracLab 2.2. A Fractal Analysis Toolbox for Signal and Image Processing. [www.project.inria.fr/fraclab] [102] X. H. Xie et al., A novel genome signature based on inter-nucleotide distances profiles for visualization of metagenomic data, Phys. A 482 (2017), 87-94. [103] X. Yang et al., Genetic cluster analysis of SARS-CoV-2 and the identification of those responsible for the major outbreaks in various countries, Emerging Microbes and Infect. 9 (2020), 1287-1299. [104] C. Cao et al., The architecture of the SARS-CoV-2 RNA genome inside virion, Nature Commun. 12 (2021), 3917(1-14). [105] A. Brant et al. SARS-CoV-2: from its discovery to genome structure, transcription, and replication, Cell and Bioscience 11 (2021), 136(1-17). [106] C. Wu et al., Structure genomics of SARS-CoV-2 and its Omicron variant: drug design templates for COVID-19, Acta Pharm. Sinica 43 (2022), 3021-3033. [107] V. Cooper, The coronavirus variants do not seem to be highly variable so far, Sci. American, 2021. [108] S. El-Kafrawy et al., Enzootic patterns of Middle East respiratory syndrome coronavirus in imported African and local Arabian dromedary camels: a prospective genomic study, The Lancet Planetary Health 3 (2019), e521-e528. [109] M. Kim et al., An infectious cDNA clone of a growth attenuated Korean isolate of MERS coronavirus KNIH002 in clade B, Emerg. Microbes Infect. 9 (2020), 2714-2720. [110] V. Dwivedi et al., Genomics, proteomics and evolution of dengue virus, Briefings in Functional Genomics 16 (2017), 217-227. [111] H. Abea et al., Re-emergence of Dengue virus serotype 3 infections in Gabon in 2016-2017, and evidence for the risk of repeated Dengue virus infections, Int. J. Infect. Diseases 91 (2020), 129-136. [112] N. Di Paola et al., Viral genomics in Ebola virus research, Nature Rev. Microbiol. 8 (2020), 365-378. [113] J. Zhang, Visualization for Information Retrieval, Springer-Verlag, 2007. [114] M. Vračko et al., Cluster analysis of coronavirus sequences using computational sequence descriptors: with applications for SARS, MERS and SARS-CoV-2 (CoVID-19), Curr. Comput. Aided Drug Design 17 (2021), 936 945. [115] V. Grishkevich and I. Yanai, Gene length and expression level shape genomic novelties, Genome Research 24 (2014), 1497-1503. [116] T. Stoeger et al., Aging is associated with a systemic length-associated transcriptome imbalance, Nature Aging 2 (2022), 1191-1206.
|