|
Vol 42(2008) N 1 p. 133-145; O.M. Korzinov1, T.V. Astakhova2,3, P.K. Vlasov4, M.A. Roytberg2,3 Statistical analysis of DNA sequences in the neighborhood of splice sites 1Moscow Institute of Physics and Technology, Dolgoprudnyi, Moscow Region, 141700, Russia2Institute of the Mathematical Problems of Biology, Russian Academy of Sciences, Pushchino, Moscow Region, 142290, Russia 3Pushchino State University, Pushchino, Moscow Region, 142290, Russia 4Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, 119991, Russia Received - 2007-03-22; Accepted - 2007-05-16 Prediction of gene sequences and their exon-intron structure in large eukaryotic genomic sequences is one of the central problems of mathematical biology. Solving this problem involves, in particular, high-accuracy splice site recognition. Using statistical analysis of a splice site-containing human gene fragment database, some characteristic features were described for nucleotide sequences in the splicing site neighborhood, the frequencies of all nucleotides and dinucleotides were determined, and those with frequencies increased or decreased in comparison to a random sequence were identified. The results can be used in sequence annotation, splicing site prediction, and the recognition of the gene exon-intron structure. splice sites, exon-intron structure of a gene, statistical sequence analysis |