Kumar Method

 

This method is a modification of the Pamilo-Bianchi-Li and Comeron (1995) methods and is able to handle some problematic degeneracy class assignments (see a detailed description below). It computes the following quantities:

 

Synonymous distance

This is the number of synonymous substitutions per synonymous site.

 

Nonsynonymous distance

This is the number of nonsynonymous substitutions per nonsynonymous site.

 

 Substitutions at the 4-fold degenerate sites

This is the number of substitutions per 4-fold degenerate site. It is useful for measuring the rate of neutral evolution.

 

 Substitutions at the 0-fold degenerate sites

This is the number of substitutions per 0-fold degenerate site. It is useful for measuring the rate of amino acid sequence evolution.

 

 Number of 4-fold degenerate sites

This is the estimate of the number of 4-fold degenerate sites, computed by averaging the number of 4-fold degenerate sites in the two sequences, compared.

 

 Number of 0-fold degenerate sites

This is the estimate of the number of 0-fold degenerate sites, computed by averaging the number of 0-fold degenerate sites in the two sequences, compared.

 

Difference between synonymous and nonsynonymous distances

This computes the differences between the synonymous and nonsynonymous distances. These statistics are useful for conducting tests of selection.

 

Kumar’s modification of the PBL method:

The treatment of arginine and isoleucine codons in the Li-Wu-Luo and the Pamilo-Bianchi-Li methods is arbitrary, which sometimes creates a problem because the arginine codons occur quite frequently. Comeron (1995) addressed this problem by dividing the 2-fold degenerate sites into two groups: 2S-fold and 2V-fold. The 2S-fold refers to sites in which the transitional change is synonymous and the two transversional changes are nonsynonymous, whereas the 2V-fold represents sites in which the transitional change is nonsynonymous and the transversional changes are synonymous. Although these definitions help in correcting some of the inaccurate classifications of synonymous and nonsynonymous sites (e.g., methionine codons), they do not solve the problem completely. For example, consider mutations in the first nucleotide position of the arginine codon: CGG produces TGG (Trp), AGG (Arg), or GGG (Gly). The transitional change (C to T) results in a nonsynonymous change. Of the two transversional substitutions, one (C to A) results in a synonymous change, while the other (C to G) results in a nonsynonymous change. Therefore, this nucleotide site is neither a 2S-fold nor a 2V-fold site. Thus, the first position of three arginine codons (CGU, CGC, and CGA) and the third position of two isoleucine codons (ATT and ATC) cannot be assigned to any of the Comeron (1995) categories. For this reason, Comeron (personal communication) used a more complicated classification of codons when he wrote his computer program. For example, the first position of arginine codon CGG was assigned to a 2V-fold site with a probability of one-third and to a 0-fold site with a probability of two-thirds. Similar assignments are used by W.-H. Li (personal communication) in his computer program.

Since the nucleotide site assignments discussed above are quite arbitrary and may not apply to all known genetic code tables, Kumar developed another method that uses the PBL method for any genetic code table. In this version, nucleotide sites are first classified into 0-fold, 2-fold, and 4-fold degenerate sites. The 2-fold degenerate sites are further subdivided into simple 2-fold and complex 2-fold degenerate sites. Simple 2-fold sites are those at which the transitional change results in a synonymous substitution and the two transversional changes result in nonsynonymous substitutions. All other 2-fold sites, including those for the three isoleucine codons, belong to the complex 2-fold site category. If we use this definition, all nucleotide sites can be classified into the five groups shown in the following table.

 

Table.

Degeneracy ->

0-fold

Simple 2-fold

Complex 2-fold

4-fold

No. of sites ->

L0

L2S

L2C

L4

 

 

 

Syn

Nonsyn

 

Transition (s)

s0

s2

s2S

s2N

s4

Transversion (v)

v0

V2

v2S

v2N

v4

Here, L0, L2S, L2C, and L4 are the numbers of 0-fold, simple 2-fold, complex 2-fold, and 4-fold degenerate sites, respectively.

 

Once this table is filled using the observed counts for a given pair of sequences, we compute the proportions of transitional (Pi) and transversional (Qi) differences for the i-fold degenerate site in the following way:

 

image\kumarmeth_d1.gif

image\kumarmeth_d4.gif

image\kumarmeth_d2.gif

image\kumarmeth_d5.gif

image\kumarmeth_d3.gif

image\kumarmeth_d6.gif

From these quantities, we compute the Ai and Bi as in the PBL method. Then using L2 = L2C + L2S, we apply the formulas for the PBL method.

 

See also Nei and Kumar (2000), page 64.