|
Vol 48(2014) N 5 p. 749-756; DOI 10.1134/S0026893314050021 E.A. Borzov1*, A.V. Marakhonov1, M.V. Ivanov2, P.B. Drozdova3, A.V. Baranova1,2,4**, M.Y.Skoblov1,2,5 RANDTRAN: Random Transcriptome Sequence Generator That Accounts for Partition Specific Features in Eukaryotic mRNA Datasets 1Research Centre for Medical Genetics, Russian Academy of Medical Sciences, Moscow, 115478 Russia2Moscow Institute of Physics and Technology (State University), Dolgoprudny, Moscow Region, 141700 Russia 3Department of Genetics and Biotechnology, St. Petersburg State University, St. Petersburg, 199034 Russia 4School of Systems Biology, George Mason University, Fairfax, USA 5Moscow State Medical and Dentae University, Moscow, 127473 Russia *eborzov@generesearch.ru **aancha@gmail.com Received - 2014-03-13; Accepted - 2014-04-09 The generation of true random and pseudorandom control sequences is an important problem of computational biology. Available random sequence generators differ in underlying probabilistic models that often remain undisclosed to users. Random sequences produced by differing probabilistic models substantially differ in their outputs commonly used as baselines for evaluations of the motif frequencies. Moreover, modern bioinformatics studies often require generation of matching control transcriptome with emulated partitions into ORFs, 5- and 3-UTRs, as well as the proportion of non-coding RNAs within model transcriptome rather than relatively simple continuous control sequences. Here we describe novel random sequence generating tool RANDTRAN that accounts for the length distribution of 5' and 3' non-translated regions in given transcriptome and the partition-specific di- and trinucleotide compositions in translated and non-translated regions. RAN-DRAN presents matching control transcriptomes in ready-to-use UCSC genome browser-compatible input files. These features may be useful for generating of control sequence sets for common types of computational analysis of various sequence motifs within various sets of RNA. RANDTRAN is available for free download at http://www.generesearch.ru/images/Randtran.rar random sequence generation, probabilistic models, transcriptome, sequence motifs |