Tajima Nei Distance (Gamma Rates and Heterogeneous patterns)

In real data, nucleotide frequencies often deviate substantially from 0.25. In this case the Tajima-Nei distance (Tajima and Nei 1984) gives a better estimate of the number of nucleotide substitutions than the Jukes-Cantor distance. Note that this assumes an equality of substitution rates among sites and between transitional and transversional substitutions. The rate variation among sites is modeled using the gamma distribution, and you will need to provide a gamma parameter (a) for computing this distance. When the nucleotide frequencies are different between the sequences, the modified formula (Tamura and Kumar 2002) relaxes the assumption of substitution pattern homogeneity.

 

The Felsenstein-Tajima-Nei model

image\ebx_232399659.gif

 

MEGA provides facilities for computing the following quantities for this method:

d: Transitions + Transversions : Number of nucleotide substitutions per site.

L: No of valid common sites: Number of sites compared.

 

The formulas for computing these quantities are as follows:

Distance

image\ebx_806362806.gif

where p is the proportion of sites with different nucleotides, a is the gamma parameter, and

image\ebx_-1067728265.gif

where xij is the relative frequency of the nucleotide pair i and j, gi’s are the nucleotide frequencies.

Variance can be estimated by the bootstrap method.