Analysis Preferences (Z-test of Selection)

In this dialog box, you can view and select options in the Options Summary. Options are organized in logical sections. A yellow row indicates that you have a choice for that particular attribute. The three primary sets of options available in this dialog box are:

Analysis

Analysis Scope

Use this option to specify whether to conduct an analysis for sequence pairs, an overall average, or within sequence groups (if sequence groups are specified).

Test Hypothesis

One way to test whether positive selection is operating on a gene is to compare the relative abundance of synonymous and nonsynonymous substitutions within the gene sequences. For a pair of sequences, this is done by first estimating the number of synonymous substitutions per synonymous site (dS) and the number of nonsynonymous substitutions per nonsynonymous site (dN), and their variances: Var(dS) and Var(dN), respectively. With this information, we can test the null hypothesis that H0: dN = dS using a Z-test:

Z = (dN - dS) / SQRT(Var(dS) + Var(dN))

The level of significance at which the null hypothesis is rejected depends on the alternative hypothesis (HA):

H0: dN = dS

HA: (a) dN ¹ dS (test of neutrality).

(b) dN > dS (positive selection).

(c ) dN < dS (purifying selection).

For alternative hypotheses (b) and (c), we use a one-tailed test and for (a) we use a two-tailed test. These three tests can be conducted directly for pairs of sequences, overall sequences, or within groups of sequences. For testing for selection in a pairwise manner, you can compute the variance of (dN - dS) by using either the analytical formulas or the bootstrap resampling method.

For data sets containing more than two sequences, you can compute the average number of synonymous substitutions and the average number of nonsynonymous substitutions to conduct a Z-test in a manner similar to the one mentioned above. The variance of the difference between these two quantities can be estimated by the bootstrap method (Nei and Kumar [2000], page 56).

Variance Estimation Method

Depending on the scope of the analysis (pairwise versus other), you may compute standard errors using analytical formulas or the bootstrap method. Whenever standard errors are estimated by the bootstrap method, you will be prompted for the number of bootstrap replicates and a random number seed.

When the selected test involves the computation of average distance, only the bootstrap method is available for computing standard errors.

Substitution Model

In this set of options, you can choose various attributes of the substitution models for DNA and protein sequences.

Substitutions Type

This is limited to Syn-Nonsynonymous.

Model

By clicking on the row of the currently selected model, you may select a stochastic model for estimating evolutionary distance (click on the yellow row first). This will reveal a menu containing many different distance methods and models.

Transition/Transversion Ratio

This option will be visible if the chosen model requires you to provide a value for the Transition/Transversion ratio (R).

Data Subset to Use

These are options for handling gaps and missing data and restricting the analysis to labeled sites, if applicable.

Gaps and Missing Data

You may choose to remove all sites containing alignment gaps and missing information before the calculation begins (Complete-deletion option). Alternatively, you may choose to retain all such sites initially, excluding them as necessary in the pairwise distance estimation (Pairwise-deletion option), or you may use Partial Deletion (Site coverage) as a percentage.

Labeled Sites

This option is available only if there are labels associated with some or all of the sites in the data. By clicking on the yellow row, you will have the option of including sites with selected labels. If you chose to include only labeled sites, they will be first extracted from the data and all of the other options mentioned above will be enforced. Note that labels associated with all three positions in the codon must be included for a full codon in the analysis.