Algorithms for Extracting Structured Motifs Using a Suffix Tree with an Application to Promoter and Regulatory Site Consensus Identification 论文

2000Journal of Computational Biology引用 246

RNA and protein synthesis mechanismsAlgorithms and Data CompressionGenomics and Phylogenetic Studies

生物科技 Genomics and Phylogenetic Studies Algorithms and Data Compression RNA and protein synthesis mechanisms

作者

摘要

This paper introduces two exact algorithms for extracting conserved structured motifs from a set of DNA sequences. Structured motifs may be described as an ordered collection of p > or = 1 "boxes" (each box corresponding to one part of the structured motif), p substitution rates (one for each box) and p - 1 intervals of distance (one for each pair of successive boxes in the collection). The contents of the boxes--that is, the motifs themselves--are unknown at the start of the algorithm. This is precisely what the algorithms are meant to find. A suffix tree is used for finding such motifs. The algorithms are efficient enough to be able to infer site consensi, such as, for instance, promoter sequences or regulatory sites, from a set of unaligned sequences corresponding to the noncoding regions upstream from all genes of a genome. In particular, both algorithms time complexity scales linearly with N2n where n is the average length of the sequences and N their number. An application to the identification of promoter and regulatory consensus sequences in bacterial genomes is shown.

作者查看全部 (2)

Marie-France Sagot

Laurent Marsan

Algorithms for Extracting Structured Motifs Using a Suffix Tree with an Application to Promoter and Regulatory Site Consensus Identification 论文

摘要

作者查看全部 (2)

相关技术查看全部 (1)

相关事件

相关文章