Models for estimating distances

The evolutionary distance between a pair of sequences usually is measured by the number of nucleotide (or amino acid) substitutions occurring between them. Evolutionary distances are fundamental for the study of molecular evolution and are useful for phylogenetic reconstructions and the estimation of divergence times. Most of the widely used methods for distance estimation for nucleotide and amino acid sequences are included in MEGA. In the following three sections, we present a brief discussion of these methods: nucleotide substitutions, synonymous-nonsynonymous substitutions, and amino acid substitutions. Further details of these methods and general guidelines for the use of these methods are given in Nei and Kumar (2000). Note that in addition to the distance estimates, MEGA also computes the standard errors of the estimates using the analytical formulas and the bootstrap method.

Distance methods included in MEGA in divided in three categories (Nucleotide, Syn-nonsynonymous, and Amino acid):

Nucleotide

Sequences are compared nucleotide-by-nucleotide. These distances can be computed for protein coding and non-coding nucleotide sequences.

No. of differences

Jukes-Cantor Model

with Rate Uniformity Among Sites

with Rate Variation Among Sites

Tajima-Nei Model

with Rate Uniformity and Pattern Homogeneity

with Rate Variation Among Sites

with Pattern Heterogeneity Between Lineages

with Rate Variation and Pattern Heterogeneity Heterogeneity

Kimura 2-Parameter Model

with Same Rate Among Sites

with Rate Variation Among Sites)

Tamura 3-Parameter Model

with Rate Uniformily and Pattern Homogeneity

with Rate Variation Among Sites

with Pattern Heterogeneity Between Lineages

with Rate Variation and Pattern Heterogeneity

Tamura-Nei Model

With Rate Uniformity and Pattern Homogeneity

with Rate Variation Among Sites

with Pattern Heterogeneity Between Lineages

with Rate Variation and Pattern Heterogeneity

Log-Det Method

with Pattern Heterogeneity Between Lineages

Maximum Composite Likelihood Model

with Rate Uniformity and Pattern Homogeneity

with Rate Variation Among Sites

with Pattern Heterogeneity Between Lineages

with Rate Variation and Pattern Heterogeneity

Syn-Nonsynonymous

Sequences are compared codon-by-codon. These distances can only be computed for protein-coding sequences or domains.

Nei-Gojobori Method

Modified Nei-Gojobori Method

Li-Wu-Luo Method

Pamilo-Bianchi-Li Method

Amino Acid

Amino acid sequences are compared residue-by-residue. These distances can be computed for protein sequences and protein-coding nucleotide sequences. In the latter case, protein-coding nucleotide sequences are automatically translated using the selected genetic code table.

No. of differences

Poisson Model

with Rate Uniformily Among Sites

with Rate Variation Among Sites

Equal Input Model

with Rate Uniformity and Pattern Homogeneity

with Rate Variation Among Sites

with Pattern Heterogeneity Between Lineages

with Rate Variation and Pattern Heterogeneity

Dayhoff and JTT Models

with Rate Uniformity Among Sites

with Rate Variation Among Sites