Equal Input Model (Heterogeneous Patterns)

In real data, amino acid frequencies usually vary among different kinds of amino acids. In this case, a correction based on the equal input model gives a better estimate of the number of amino acid substitutions than does the Poisson correction distance. Note that this assumes an equality of substitution rates among sites. When the amino acid frequencies are different between the sequences, the modified formula (Tamura and Kumar 2002) relaxes the estimation bias.

MEGA provides facilities for computing the following quantities:

Quantity	Description
d: distance	Number of amino acid substitutions per site.
L: No of valid common sites	Number of sites compared.

Formulas used are:

Distance

$image\ebx_935352493.gif$

where p is the proportion of different amino acid sites, gXi is the frequency of amino acid i for sequence X, gi is the average frequency for the pair of the sequences, and

$image\ebx_827147449.gif$

The variance of d can be estimated by the bootstrap method.