Models for estimating distances

The evolutionary distance between a pair of sequences usually is measured by the number of nucleotide (or amino acid) substitutions occurring between them. Evolutionary distances are fundamental for the study of molecular evolution and are useful for phylogenetic reconstructions and the estimation of divergence times. Most of the widely used methods for distance estimation for nucleotide and amino acid sequences are included in MEGA. In the following three sections, we present a brief discussion of these methods: nucleotide substitutions, synonymous-nonsynonymous substitutions, and amino acid substitutions. Further details of these methods and general guidelines for the use of these methods are given in Nei and Kumar (2000). Note that in addition to the distance estimates, MEGA also computes the standard errors of the estimates using the analytical formulas and the bootstrap method.

Distance methods included in MEGA in divided in three categories (Nucleotide, Syn-nonsynonymous, and Amino acid):

Nucleotide

Sequences are compared nucleotide-by-nucleotide. These distances can be computed for protein coding and non-coding nucleotide sequences.

 No. of differences

 p-distance

 Jukes-Cantor Model

  with Rate Uniformity Among Sites

  with Rate Variation Among Sites

 Tajima-Nei Model

  with Rate Uniformity and Pattern Homogeneity

  with Rate Variation Among Sites

  with Pattern Heterogeneity Between Lineages

  with Rate Variation and Pattern Heterogeneity Heterogeneity

 Kimura 2-Parameter Model

  with Same Rate Among Sites

  with Rate Variation Among Sites)

 Tamura 3-Parameter Model

  with Rate Uniformily and Pattern Homogeneity

  with Rate Variation Among Sites

  with Pattern Heterogeneity Between Lineages

  with Rate Variation and Pattern Heterogeneity

 Tamura-Nei Model

  With Rate Uniformity and Pattern Homogeneity

  with Rate Variation Among Sites

  with Pattern Heterogeneity Between Lineages

  with Rate Variation and Pattern Heterogeneity

 Log-Det Method

  with Pattern Heterogeneity Between Lineages

 Maximum Composite Likelihood Model

  with Rate Uniformity and Pattern Homogeneity

  with Rate Variation Among Sites

  with Pattern Heterogeneity Between Lineages

  with Rate Variation and Pattern Heterogeneity

Syn-Nonsynonymous

Sequences are compared codon-by-codon. These distances can only be computed for protein-coding sequences or domains.

 Nei-Gojobori Method

 Modified Nei-Gojobori Method

 Li-Wu-Luo Method

 Pamilo-Bianchi-Li Method

 Kumar Method

Amino Acid

Amino acid sequences are compared residue-by-residue. These distances can be computed for protein sequences and protein-coding nucleotide sequences. In the latter case, protein-coding nucleotide sequences are automatically translated using the selected genetic code table.

 No. of differences

 p-distance

 Poisson Model

  with Rate Uniformily Among Sites

  with Rate Variation Among Sites

 Equal Input Model

  with Rate Uniformity and Pattern Homogeneity

  with Rate Variation Among Sites

  with Pattern Heterogeneity Between Lineages

  with Rate Variation and Pattern Heterogeneity

 Dayhoff and JTT Models

  with Rate Uniformity Among Sites

  with Rate Variation Among Sites