Kumar Method

This method is a modification of the Pamilo-Bianchi-Li and Comeron (1995) methods and is able to handle some problematic degeneracy class assignments (see a detailed description below). It computes the following quantities:

Synonymous distance

This is the number of synonymous substitutions per synonymous site.

Nonsynonymous distance

This is the number of nonsynonymous substitutions per nonsynonymous site.

Substitutions at the 4-fold degenerate sites

This is the number of substitutions per 4-fold degenerate site. It is useful for measuring the rate of neutral evolution.

Substitutions at the 0-fold degenerate sites

This is the number of substitutions per 0-fold degenerate site. It is useful for measuring the rate of amino acid sequence evolution.

Number of 4-fold degenerate sites

This is the estimate of the number of 4-fold degenerate sites, computed by averaging the number of 4-fold degenerate sites in the two sequences, compared.

Number of 0-fold degenerate sites

This is the estimate of the number of 0-fold degenerate sites, computed by averaging the number of 0-fold degenerate sites in the two sequences, compared.

Difference between synonymous and nonsynonymous distances

This computes the differences between the synonymous and nonsynonymous distances. These statistics are useful for conducting tests of selection.

Kumars modification of the PBL method:

The treatment of arginine and isoleucine codons in the Li-Wu-Luo and the Pamilo-Bianchi-Li methods is arbitrary, which sometimes creates a problem because the arginine codons occur quite frequently. Comeron (1995) addressed this problem by dividing the 2-fold degenerate sites into two groups: 2S-fold and 2V-fold. The 2S-fold refers to sites in which the transitional change is synonymous and the two transversional changes are nonsynonymous, whereas the 2V-fold represents sites in which the transitional change is nonsynonymous and the transversional changes are synonymous. Although these definitions help in correcting some of the inaccurate classifications of synonymous and nonsynonymous sites (e.g., methionine codons), they do not solve the problem completely. For example, consider mutations in the first nucleotide position of the arginine codon: CGG produces TGG (Trp), AGG (Arg), or GGG (Gly). The transitional change (C to T) results in a nonsynonymous change. Of the two transversional substitutions, one (C to A) results in a synonymous change, while the other (C to G) results in a nonsynonymous change. Therefore, this nucleotide site is neither a 2S-fold nor a 2V-fold site. Thus, the first position of three arginine codons (CGU, CGC, and CGA) and the third position of two isoleucine codons (ATT and ATC) cannot be assigned to any of the Comeron (1995) categories. For this reason, Comeron (personal communication) used a more complicated classification of codons when he wrote his computer program. For example, the first position of arginine codon CGG was assigned to a 2V-fold site with a probability of one-third and to a 0-fold site with a probability of two-thirds. Similar assignments are used by W.-H. Li (personal communication) in his computer program.

Since the nucleotide site assignments discussed above are quite arbitrary and may not apply to all known genetic code tables, Kumar developed another method that uses the PBL method for any genetic code table. In this version, nucleotide sites are first classified into 0-fold, 2-fold, and 4-fold degenerate sites. The 2-fold degenerate sites are further subdivided into simple 2-fold and complex 2-fold degenerate sites. Simple 2-fold sites are those at which the transitional change results in a synonymous substitution and the two transversional changes result in nonsynonymous substitutions. All other 2-fold sites, including those for the three isoleucine codons, belong to the complex 2-fold site category. If we use this definition, all nucleotide sites can be classified into the five groups shown in the following table.

Table.

Degeneracy ->	0-fold	Simple 2-fold	Complex 2-fold		4-fold
No. of sites ->	L0	L2S	L2C		L4
			Syn	Nonsyn
Transition (s)	s0	s2	s2S	s2N	s4
Transversion (v)	v0	V2	v2S	v2N	v4

Here, L0, L2S, L2C, and L4 are the numbers of 0-fold, simple 2-fold, complex 2-fold, and 4-fold degenerate sites, respectively.

Once this table is filled using the observed counts for a given pair of sequences, we compute the proportions of transitional (Pi) and transversional (Qi) differences for the i-fold degenerate site in the following way:

$image\kumarmeth_d1.gif$	$image\kumarmeth_d4.gif$
$image\kumarmeth_d2.gif$	$image\kumarmeth_d5.gif$
$image\kumarmeth_d3.gif$	$image\kumarmeth_d6.gif$

From these quantities, we compute the Ai and Bi as in the PBL method. Then using L2 = L2C + L2S, we apply the formulas for the PBL method.