Keywords and phrases: substitution matrix, affine gaps, Monte Carlo method, extreme value distribution, Markov chains, Benjamini-Hochberg procedure.
Received: September 20, 2022; Accepted: November 26, 2022; Published: March 20, 2023
How to cite this article: Rajashree Chaurasia and Udayan Ghose, On the statistical significance of pairwise global alignments of nucleotide sequences, JP Journal of Biostatistics 23(1) (2023), 51-76. http://dx.doi.org/10.17654/0973514323004
This Open Access Article is Licensed under Creative Commons Attribution 4.0 International License
References:
[1] R. Durbin, Biological Sequence Analysis Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, 2013. [2] R. Chaurasia and U. Ghose, On the effects of substitution matrix choices for pairwise gapped global sequence alignment of DNA nucleotides, Proceedings of the 4th International Conference on Advanced Informatics for Computing Research, Communications in Computer and Information Science, Springer, Singapore, Vol. 1393, 2021, pp. 113-125. DOI: 10.1007/978-981-16-3660-8_11. [3] D. T. Jones, W. R. Taylor and J. M. Thornton, The rapid generation of mutation data matrices from protein sequences, Bioinformatics 8(3) (1992), 275-282. DOI: 10.1093/bioinformatics/8.3.275. [4] M. O. Dayhoff, R. M. Schwartz and B. C. Orcutt, A model of evolutionary change in proteins, M. O. Dayhoff, ed., Atlas of Protein Sequence and Structure, National Biomedical Research Foundation, Washington DC, 5(3) (1978), 345-352. [5] S. Henikoff and J. G. Henikoff, Amino acid substitution matrices from protein blocks, Proceedings of the National Academy of Sciences 89(22) (1992), 10915-10919. DOI: 10.1073/pnas.89.22.10915. [6] T. Müller and M. Vingron, Modeling amino acid replacement, Journal of Computational Biology 7(6) (2000), 761-776. DOI: 10.1089/10665270050514918. [7] T. Müller, R. Spang and M. Vingron, Estimating amino acid substitution models: a comparison of Dayhoff’s estimator, the resolvent approach and a maximum likelihood method, Molecular Biology and Evolution 19(1) (2002), 8-13. DOI: 10.1093/oxfordjournals.molbev.a003985. [8] M. S. Waterman and M. Vingron, Rapid and accurate estimates of statistical significance for sequence data base searches, Proceedings of the National Academy of Sciences 91(11) (1994), 4625-4628. DOI: 10.1073/pnas.91.11.4625. [9] R. Mott, Accurate formula for P-values of gapped local sequence and profile alignments, Journal of Molecular Biology 300(3) (2000), 649-659. DOI: 10.1006/jmbi.2000.3875. [10] R. Mott, Maximum-likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores, Bulletin of Mathematical Biology 54(1) (1992), 59-75. DOI: 10.1007/bf02458620. [11] R. Olsen, R. Bundschuh and T. Hwa, Rapid assessment of extremal statistics for gapped local alignment, International Conference on Intelligent Systems for Molecular Biology, Menlo Park, CA, AAAI Press, 1999, pp. 211-222. [12] S. Karlin and S. F. Altschul, Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes, Proceedings of the National Academy of Sciences 87(6) (1990), 2264-2268. DOI: 10.1073/pnas.87.6.2264. [13] S. F. Altschul and W. Gish, Local alignment statistics, R. F. Doolittle, ed., Methods in Enzymology 266 (1996), 460-480. DOI: 10.1016/s0076-6879(96)66029-7. [14] W. R. Pearson, Empirical statistical estimates for sequence similarity searches, Journal of Molecular Biology 276(1) (1998), 71-84. DOI: 10.1006/jmbi.1997.1525. [15] X. Huang and D. L. Brutlag, Dynamic use of multiple parameter sets in sequence alignment, Nucleic Acids Research 35(2) (2006), 678-686. DOI: 10.1093/nar/gkl1063. [16] J. G. Reich, H. Drabsch and A. Däumler, On the statistical assessment of similarities in DNA sequences, Nucleic Acids Research 12(13) (1984), 5529-5543. DOI: 10.1093/nar/12.13.5529. [17] S. F. Altschul and B. W. Erickson, Significance of nucleotide sequence alignments: a method for random sequence permutation that preserves dinucleotide and codon usage, Molecular Biology and Evolution 2(6) (1985), 526-538. DOI: 10.1093/oxfordjournals.molbev.a040370. [18] M. Y. Tabari and A. Pouyan, Estimating Reliability in Mobile ad-hoc Networks Based on Monte Carlo Simulation (TECHNICAL NOTE), International Journal of Engineering 27(5) (2014), 739-746. [19] F. Jolai and S. M. T. F. Ghomi, Combination of approximation and simulation approaches for distribution functions in stochastic networks, International Journal of Engineering 12(3) (1999), 145-154. [20] Z. Tabatabaeian and M. Neshati, Sensitivity analysis of a wideband backward-wave directional coupler using neural network and Monte Carlo method (Research Note), International Journal of Engineering 31(5) (2018), 729-733. [21] H. Nguyen, Probabilistic assessment of bending strength of statically indeterminate reinforced concrete beams, International Journal of Engineering 35(4) (2022), 837-844. DOI: 10.5829/ije.2022.35.04A.24. [22] U.S. National Library of Medicine, Needleman-Wunsch alignment of two nucleotide sequences, National Center for Biotechnology Information, Retrieved October 27, 2021, from https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch&PROG_ DEF=blastn&BLAST_PROG_DEF=blastn&BLAST_SPEC=GlobalAln&LINK_LOC=BlastHomeLink. [23] Emboss Needle, EBI. Retrieved September 16, 2021, from https://www.ebi.ac.uk/Tools/psa/emboss_needle/. [24] Emboss Stretcher, EBI. Retrieved September 15, 2021, from https://www.ebi.ac.uk/Tools/psa/emboss_stretcher/. [25] Wikipedia, List of sequence alignment software, Wikipedia Retrieved September 17, 2021, from https://en.wikipedia.org/w/index.php?title=List_of_sequence_alignment_ software&oldid=979369078. [26] Emboss stretcher help and Documentation, EBI. Retrieved September 15, 2021, from https://www.ebi.ac.uk/seqdb/confluence/display/JDSAT/EMBOSS+Stretcher +Help+and+Docentation. [27] R. Chaurasia and U. Ghose, Assessing the statistical significance of pairwise gapped global sequence alignment of DNA nucleotides using Monte Carlo Techniques, Proceedings of 4th International Conference of Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing, Springer, Singapore, Vol. 1318, 2021, pp. 57-70. https://doi.org/10.1007/978-981-33-6862-0_5. [28] G. Peris and A. Marzal, Statistical significance of normalized global alignment, Journal of Computational Biology 21(3) (2014), 257-268. https://doi.org/10.1089/cmb.2012.0167. [29] A. Y. Mitrophanov and M. Borodovsky, Statistical significance in biological sequence analysis, Briefings in Bioinformatics 7(1) (2006), 2-24. DOI: 10.1093/bib/bbk001. [30] M. S. Waterman, Mathematical Methods for DNA Sequences, CRC Press, 1989. [31] S. F. Altschul, M. S. Boguski, W. Gish and J. C. Wootton, Issues in searching molecular sequence databases, Nature Genetics 6(2) (1994), 119-129. DOI: 10.1038/ng0294-119. [32] D. States, W. Gish and S. Altschul, Improved sensitivity of nucleic acid database searches using application specific scoring matrices, Methods 3(1) (1991), 66-70. DOI: 10.1016/s1046-2023(05)80165-3. [33] D. M. T. Tammi, Evaluate DNA scoring matrix values - find out what is the DNA scoring target frequency, Retrieved October 18, 2021. URL: https://bioinformaticshome.com/online_software/evaluateDNAscoring/ evaluateDNAscoring.html. [34] R. J. Simes, An improved Bonferroni procedure for multiple tests of significance, Biometrika 73(3) (1986), 751-754. DOI: 10.1093/biomet/73.3.751. [35] Y. Benjamini and Y. Hochberg, Controlling the false discovery rate: a practical and powerful approach to multiple testing, Journal of the Royal Statistical Society Series B (Methodological) 57(1) (1995), 289-300. http://www.jstor.org/stable/2346101. [36] P. Villesen, Random DNA sequence generator, Retrieved September 17, 2021, from https://usersbirc.au.dk/~palle/php/fabox/random_sequence_generator.php. [37] RSAT, Random sequence, Retrieved September 16, 2021, from http://rsat.sb-roscoff.fr/random-seq_form.cgi. [38] Random DNA generator, Retrieved September 16, 2021, from https://www.faculty.ucr.edu/~mmaduro/random.htm. [39] NCBI, Nucleotide Database, Retrieved September 15, 2021, from https://www.ncbi.nlm.nih.gov/nucleotide/. [40] W. R. Pearson, An introduction to sequence similarity (“homology”) searching, Current Protocols in Bioinformatics, Chapter 3, 2013, 3.1.1-3.1.8. DOI: 10.1002/0471250953.bi0301s42. [41] MATLAB, Assessing the significance of an alignment, Assessing the Significance of an Alignment - MATLAB & Simulink, Retrieved October 28, 2021, from https://www.mathworks.com/help/bioinfo/examples/assessingthe- significance-of-an-alignment.html. [42] M. Waterman and R. A. Elton, Estimating statistical significance of sequence alignments [and Discussion], Philosophical Transactions: Biological Sciences 344(1310) (1994), 383-390. http://www.jstor.org/stable/56110. [43] M. Vingron and M. S. Waterman, Sequence alignment and penalty choice, Journal of Molecular Biology 235(1) (1994), 1-12. DOI: 10.1016/s0022-2836(05)80006-3.
|