Setup/Select Genes & Domains Dialog

 

Use the Gene & Domain Editor to inspect, define, and select domainsRH_Domains, and genes, RH_Gene and labels for individual sites.

 

The Genes & Domains dialog consists of two tabs: Define/Edit/Select and Site Labels.

 

Define/Edit/Select tab

This tab contains a hierarchical listing of gene and domain names with the corresponding information organized into four columns for amino acid sequences and six columns for nucleotide sequences. 

 

Gene and domain name listing

Each line in this display contains a small 'expand/contract' box, a checkbox, a gene/domain icon, and the name of the gene or domain. The 'expand/contract' box allows you to display or hide the information below a given gene. The checkbox shows if the gene or domain is currently selected for analysis.  All defined genes and domains appear below the Genes\Domain node in the hierarchy.  All domain names are shown with a yellow background.  The Independent node shows the number of Independent sitesRH_Independent_Sites, which are not assigned to any domains or genes.

 

If your input data file does not contain any domains, then MEGA automatically creates a domain called Data.  If you wish to create new domains, you should delete the Data domain to make all sites independent.  Remember that only independent sites can be assigned to domains, and sites cannot be assigned to multiple domains.  Genes are simply collections of domains, and thus gene boundaries are decided based on the domains contained in them.  The MEGA gene and domain organizer is flexible and is designed to enable you to specify genes and domains as they appear in a genome.  For instance, a sequence may contain one or more genes, each of which may contain one or more domains.  In between genes, there may be inter-genetic domains.  In addition, within or between genes or domains, there may be sites that are not members of any domain.

 

At the bottom of this tab, you will find a toolbar with many drop-down menu buttons, which can be used to Add/Insert new genes or domains.  The add and insert operations differ in the following way.  If you add a gene or domain, then the new gene or domain will be added at the end of the list to which the currently focused gene or domain belongs.   If you insert a gene (or domain), it will be inserted by shifting all the following genes or domains down.  Add and Insert commands are context sensitive.

 

You can rearrange the relative position of genes and domains by drag-and-drop operations. 

 

Inspecting/modifying attributes of genes and domains

When you start, all genes and domains are shown.  Click on the ‘+’ in the expand/contract box to expand the listing for each gene to its domains. Click on the ‘-‘ to collapse to the gene. To select and deselect genes or domains from analysis, click in the corresponding checkbox. When a gene is selected but some domains within the gene are not, the checkbox for the gene will be grayed. If you deselect a gene, all domains within that gene are automatically deselected.

 

On the right side of the gene and domain hierarchy, you will find at least four columns of information for each domain and gene.  All information shown for genes is computed based on the domains contained.

 

The first two columns show the site number in the sequence where the domain begins (From column) and where it ends (To column).  The total number of sites shown next to the To column indicates the total number of sites automatically computed, based on the range of information given in the previous two columns.  A question mark (?) shows that the domain exists but that the range of sites is not yet specified.

 

To specify or change sites that belong to a given domain, click on the domain name.  The corresponding rows in the From and the To columns contain a button with three dots (ellipses).  To change the start site, click on the ellipses in the From column. This will bring up a small Site Picker dialog box with which you can highlight the desired site and click OK.  In this viewer, you will see that sites have different background colors.  A white background marks independent sites, a red background indicates that the site is used by another domain, and a yellow background shows that the current site belongs to the domain being edited.  To cancel any changes, click on Cancel in the Site Picker dialog box.

 

For nucleotide sequences, two additional columns are found in the Define/Edit/Select tab:  the Coding? column and the Codon Start column.  A check-mark in the Coding? column shows that a given domain is protein coding.  If it is checked, then the next column allows you to specify whether the first site in the domain is in the first, second, or the third codon position.

 

Site Labels Tab

This tab displays sequences and allows you to label individual sites.  To do this, change the default underscore (_) in the topmost line to the labelRH_Labeled_Sites of choice and give it a light green background.  The site number will be displayed below in a window, next to which is shown the name of the domain, along with gene, name.   Labeled sites can be selected or deselected for analysis.

 

To change or give a label to a site, click on the site and type in the character you wish to mark it with. You can use the left and right arrow buttons on the keyboard to move to and then label adjacent sites. To change a label, simply overtype it. To remove a label, use the spacebar to type a space.

 

Example 

Imagine an alignment consisting of a genomic sequence, including a gene and its upstream and downstream regions. You can define each intron and exon as a domain, and then define the overall gene, assigning the exons and introns to that gene. The upstream and downstream regions also can be defined as domains, or possibly multiple domains, depending on the analysis you wish to perform. These domains do not have to be assigned to any gene. Furthermore, some sites may be left unassigned, as independent sites. These can be scattered throughout the sequence and can be included or excluded from analysis as a group.  If you have a complicated patterns of sites you wish to analyze as groups, and the domain gene approach is unsuitable, you should assign a category to these sites, which can be specified in addition to the groups and domains.