Sequence Data Explorer

 

The Sequence Data Explorer shows the aligned sequence data. You can scroll along the alignment using the scrollbar at the bottom right hand side of the explorer window. The Sequence Data Explorer provides a number of useful functionalities for exploring the statistical attributes of the data and also for selecting data subsets.

 

This explorer consists of a number of regions as follows:

Menu Bar

Data menu

Display menu

Highlight menu

Statistics menu

Help: This item brings up the help file for the Sequence Data Explorer.

Tool Bar

The tool bar provides quick access to the following menu items:

·      General Utilities

·      images\seq_data_explorer_diskette.gif: This brings up the Exporting Sequence Data dialog box, which contains options to control how MEGA writes the output data.

·      Color: This brings up a color palette selection box with which you can choose the color to be displayed in the highlighted sites.

·      images\edit_gene_domain_button.gif: This brings up the dialog box for setting up and selecting domains and genes.

·      images\edit_taxa_gp_button.gif: This brings up the dialog box for setting up, editing, and selecting taxa and groups of taxa.

·      images\seq_data_explorer_identical.gif: This toggle replaces the nucleotide (amino acid) at a site with the identical symbol (e.g. a dot) if the site contains the same nucleotide (amino acid).

·      Highlighting Sites

·      C: If this button is pressed, then all constant sites will be highlighted. A count of the highlighted sites will be displayed on the status bar.

·      V: If this button is pressed, then all variable sites will be highlighted. A count of the highlighted sites will be displayed on the status bar.

·      Pi: If this button is pressed, then all parsimony-informative sites will be highlighted. A count of the highlighted sites will be displayed on the status bar.

·      S: If this button is pressed, then all singleton sites will be highlighted. A count of the highlighted sites will be displayed on the status bar.

·      0: If this button is pressed, then sites will be highlighted only if they are zero-fold degenerate sites in all sequences displayed. A count of highlighted sites will be displayed on the status bar. (This button is available only if the dataset contains protein coding DNA sequences).

·      2: If this button is pressed, then sites will be highlighted only if they are two-fold degenerate sites in all sequences displayed. A count of highlighted sites will be displayed on the status bar. (This button is available only if the dataset contains protein coding DNA sequences).

·      4: If this button is pressed, then sites will be highlighted only if they are four-fold degenerate sites in all sequences displayed. A count of highlighted sites will be displayed on the status bar. (This button is available only if the dataset contains protein coding DNA sequences).

·      images\seq_data_explorer_translate.gif: This button provides the facility to translate codons in the sequence data into amino acid sequences and back. All protein-coding regions will be automatically identified and translated for display. When the translated sequence is already displayed, then issuing this command displays the original nucleotide sequences (including all coding and non-coding regions). Depending on the data displayed (translated or nucleotide), relevant menu options in the Sequence Data Explorer become enabled. Note that the translated/un-translated status in this data explorer does not have any impact on the options for analysis available in MEGA (e.g., Distances or Phylogeny menus), as MEGA provides all possible options for your dataset at all times.

The 2-Dimensional Data Grid

Fixed Row: This is the first row in the data grid. It is used to display the nucleotides (or amino acids) in the first sequence when you have chosen to show their identity using a special character. For protein coding regions, it also clearly marks the first, second, and the third codon positions.

Fixed Column: This is the first and the leftmost column in the data grid. It is always visible, even when you are scrolling through sites. The column contains the sequence names and an associated check box. You can check or uncheck this box to include or exclude a sequence from analysis. Also in this column, you can drag-and-drop sequences to sort them.

Rest of the Grid: Cells to the right of and below the first row contain the nucleotides or amino acids of the input data. Note that all cells are drawn in light color if they contain data corresponding to unselected sequences or genes or domains.

Status Bar

This section displays the location of the focused site and the total sequence length. It also shows the site label, if any, and a count of the highlighted sites.