Sequence Data Explorer

The Sequence Data Explorer shows the aligned sequence data. You can scroll along the alignment using the scrollbar at the bottom right hand side of the explorer window. The Sequence Data Explorer provides a number of utilities for exploring the statistical attributes of the data and also for selecting data subsets.

This explorer consists of a number of regions as follows:

Menu Bar

Data menu

Help: This item brings up the help file for the Sequence Data Explorer.

Tool Bar

The tool bar provides quick access to the following menu items:

General Utilities

: This brings up the Exporting Sequence Data dialog box, which contains options to control how MEGA writes the output data, available options are Text, MEGA, CSV, and Excel.

: This brings up the Exporting Sequence Data dialog box and sets the default output format to MEGA.

: This brings up the Exporting Sequence Data dialog box and sets the default output format to Excel.

: This brings up the Exporting Sequence Data dialog box and sets the default output format to CSV (Comma separated values).

: This brings up the dialog boxHC_Genes_Domains_Dialog for setting up and selecting domains and genes.

: This brings up the dialog boxHC_Setup_Taxa_Groups_Dlg for setting up, editing, and selecting taxa and groups of taxa.

: This toggle replaces the nucleotide/amino acid at a site with the identical symbol (e.g. a dot) if the site contains the same nucleotide/amino acid.

: This button provides the facility to translate codons in the sequence data into amino acid sequences and back. All protein-coding regions will be automatically identified and translated for display. When the translated sequence is already displayed, then issuing this command displays the original nucleotide sequences (including all coding and non-coding regions). Depending on the data displayed (translated or nucleotide), relevant menu options in the Sequence Data Explorer become enabled. Note that the translated/un-translated status in this data explorer does not have any impact on the options for analysis available in MEGA (e.g., Distances or Phylogeny menus), as MEGA provides all possible options for your dataset at all times.

Highlighting Sites

C: If this button is pressed, then all constant sitesRH_Constant_Site will be highlighted. A count of the highlighted sites will be displayed on the status bar.

V: If this button is pressed, then all variable sitesRH_Variable_site will be highlighted. A count of the highlighted sites will be displayed on the status bar.

Pi: If this button is pressed, then all parsimony-informative sitesRH_Parsimony_informative_site will be highlighted. A count of the highlighted sites will be displayed on the status bar.

S: If this button is pressed, then all singleton sitesRH_Singleton_Sites will be highlighted. A count of the highlighted sites will be displayed on the status bar.

L: If this button is pressed, then all labelled sites will be highlighted and a count of highlighted sites will be displayed on the status bar (see also labelled sites).

0: If this button is pressed, then sites will be highlighted only if they are zero-fold degenerate sitesRH_Degeneracy in all sequences displayed. A count of highlighted sites will be displayed on the status bar. (This button is available only if the dataset contains protein coding DNA sequences).

2: If this button is pressed, then sites will be highlighted only if they are two-fold degenerate sitesRH_Degeneracy in all sequences displayed. A count of highlighted sites will be displayed on the status bar. (This button is available only if the dataset contains protein coding DNA sequences).

4: If this button is pressed, then sites will be highlighted only if they are four-fold degenerate sitesRH_Degeneracy in all sequences displayed. A count of highlighted sites will be displayed on the status bar. (This button is available only if the dataset contains protein coding DNA sequences).

Special: This dropdown allows for the selection of a special highlighting option.

CpG/TpG/CpA: if this button is pressed, then all sites which have a C followed by a G, T by G, or C by A will be highlighted. You may also select a percentage of sequences which must have these properties for a site to be counted.

Coverage: if this button is pressed, then you will enter a percentage. All the sites with this percentage or less of ambiguous sites will be highlighted.

: This button allows you to quickly navigate between highlighted sites by jumping to the previous or next highlighted site.

Searching

: This button allows you to specify a sequence name to find. Search results are bolded and the row is highlighted blue. MEGA first looks for an exact match to the name you specified, if none exists it looks for names starting with what you provided, if no names start with the provided search term, then MEGA looks for your search term anywhere in the names(rather than just the start).

: This button allows you to specify a Motif to search for in the sequence data. This Motif supports IUPAC codes such as R (for A or G) and Y (for T or C). MEGA highlights (in Yellow) the first instance of this motif it finds.

and : These buttons are only enabled if you have already searched for a Sequence Name or Motif. By clicking the forward or backward button MEGA will search for the next or previous search result (assuming there is more than one possible matches).

The 2-Dimensional Data Grid

Fixed Row: This is the first row in the data grid. It is used to display the nucleotides (or amino acids) in the first sequence when you have chosen to show their identity using a special character. For protein coding regions, it also clearly marks the first, second, and the third codon positions.

Fixed Column: This is the first and the leftmost column in the data grid. It is always visible, even when you are scrolling through sites. The column contains the sequence names and an associated check box. You can check or uncheck this box to include or exclude a sequence from analysis. Also in this column, you can drag-and-drop sequences to sort them.

Rest of the Grid: Cells to the right of and below the first row contain the nucleotides or amino acids of the input data. Note that all cells are drawn in light color if they contain data corresponding to unselected sequences or genes or domains.

Status Bar

This section displays the location of the focused site and the total sequence length. It also shows the site label, if any, and a count of the highlighted sites.