On the use of nucleic acid sequences to infer early branchings in the tree of life. 论文

1995Molecular Biology and Evolution引用 218顶会
Genomics and Phylogenetic StudiesGene expression and cancer classificationAlgorithms and Data Compression

摘要

Simplifying assumptions made in various tree reconstruction methods--notably rate constancy among nucleotide sites, homogeneity, and stationarity of the substitutional processes--are clearly violated when nucleotide sequences are used to infer distant relationships. Use of tree reconstruction methods based on such oversimplified assumptions can lead to misleading results, as pointed out by previous authors. In this paper, we made use of a (discretized) gamma distribution to account for variable rates of substitution among sites and built models that allowed for unequal base frequencies in different sequences. The models were nonhomogeneous Markov-process models, assuming different patterns of substitution in different parts of the tree. Data of the small-subunit rRNAs from four species were analyzed, where base frequencies were quite different among sequences and rates of substitution were highly variable at sites. Parameters in the models were estimated by maximum likelihood, and models were compared by the likelihood-ratio test. The nonhomogeneous models provided significantly better fit to the data than homogeneous models despite their involvement of many parameters. They also appeared to produce reasonable estimation of the phylogenetic tree; in particular, they seemed able to identify the root of the tree.