Gene Duplication Inference

 

Gene Duplication Wizard

MEGA7 introduces a new wizard-style system for identifying gene duplications (and optionally, speciation events) in a gene family tree. This program is available from the main MEGA form in the User Tree menu. When the analysis is launched, the Gene Duplication Wizard, which guides the user through the steps for inferring gene duplications, is shown.

Input Data

·      Gene Tree File – the gene family tree in a Newick formatted file is required for the analysis.

·      Species Tree File – an optional species tree file in Newick format. If a species tree is provided, then the algorithm described in Zmasek and Eddy (2001) will be used to infer gene duplications and speciation events for all internal nodes in the gene tree. If the species tree is not provided, then all internal nodes in the tree that contain at least one common species in the two descendant clades will be marked as gene duplication events.

·      Mapping of taxa names to species names – the species name for each taxon must be provided and a simple grid-like dialog is provided for completing this task. With this dialog, users can either manually enter the species name for each taxon or load the names from a text file that gives the mapping in the form

taxonName=speciesName

for each taxon and each mapping is on its own line.

Steps for Doing the Gene Duplication Analysis

1.    Load the gene tree file – in the first step, the wizard is used to browse for and load the Newick formatted gene tree file

2.    Map species names to taxa names - in the second step, species names are mapped to taxa names using a grid-like interface. Species names can be entered manually or imported from a text file that gives the name for each taxon as

taxonName=speciesName

and each mapping is on a separate line.

3.    Load an (optional) species tree – if a species tree is available, the wizard can be used to browse for and load the Newick formatted species tree file.

4.    Root the gene tree (optional) – if the root of the gene tree is known, the wizard can be used to specify the root. If this option is chosen, the gene tree will be displayed in the Tree Explorer window and users can specify the root by clicking on a branch or node to root the tree on. If the root is not known, the analysis will be performed with all possible root placements and the placement(s) of the root that results in the minimum number of gene duplications will be kept and all others discarded.

5.    Root the species tree – the analysis requires that the species tree, if provided, is rooted. Rooting the species tree is done in the same way as rooting the gene tree, via the tree explorer.

6.    Launch the Analysis – the final step is to launch the analysis. A progress window is displayed while the calculation is executed and once the analysis is complete, the gene family tree is displayed in the Tree Explorer window with gene duplications marked by solid blue diamonds in the tree and if a species tree was provided, speciation events are marked by open red diamonds in the tree.