Computing Statistical Quantities for Nucleotide Sequences
In this exercise, we
illustrate the use of the Data Explorer for computing various
statistical quantities of nucleotide sequences. In addition, we explain
shortcuts for obtaining frequently used commands, methods of accessing on-line
help, and the distinction between enabled and disabled commands.
Ex 8.0.1: Start MEGA by double-clicking on the MEGA
desktop icon, or by using the Windows start-menu to click on the MEGA
icon located in the programs folder.
We now will examine the contents of the file Drosophila_Adh.meg
by using the built-in Text Editor.
Ex 8.1.1: Click on the File menu item to expand
the menu options. To activate the text
editor, either click File |Text Editor
or press the F3 key on your keyboard. In the text editor, use the
File|Open command to open the
Ex 8.1.2: Examine the Drosophila_Adh.meg
file. Take note of the #mega format specifier, title,
OTU names, and the interleaved sequence data.
Ex 8.1.3: We advise that you exit the text editor
before proceeding with data analysis. Select the File menu item from the
text editor's menu, and click the Exit option from the expanded menu. If
the editor asks you if you would like to save the changes that you have made to
the file, select No.
study statistical quantities of the data in the file Drosophila_Adh.meg,
we must first activate it.
Ex 8.2.1: You can activate a data file using the link
titled “Click me to activate a data file” in the main application window, or
select the File menu item from the main menu and click the Open Data option
from the expanded menu. You may also press the F5 key on your keyboard.
All of these methods will display a standard Windows open file dialog box.
Ex 8.2.2: Open the Drosophila_Adh.meg
data file under the Examples folder.
Ex 8.2.3: A progress dialog box will appear briefly.
When the data file is active, details about it are displayed at the bottom of
the main application window. More menu items now are available on the main
Examine the main menu. Now
that the data file is active, the menu items Data, Distances, Pattern, and
Selection have become available.
We now will use Data Explorer to compute some basic statistics for these
Ex 8.3.1: Select the Data|Data
Explorer command, or press the F4 key if the Sequence Data Explorer is not available.
Ex 8.3.2: DNA sequences are displayed on the screen in
a grid format. Use the left and right arrow keys (←→) or the mouse
to move from site to site; note a change in the bottom-left corner of the
display. Use the up and down (↑↓) arrow keys or the mouse to move
between OTUs. The Total Sites view on the
bottom-left panel displays the sequence length under the current site position,
and the Highlighted Sites displays “None”
because no special site attributes are yet highlighted.
Ex 8.3.3: To highlight variable sites, select the
Highlight|Variable Sites option, click the button
labeled “V” from the shortcut bar below the menu, or press the V key.
All sites that are variable are highlighted, and the number
in the Highlighted Sites display changes. When you press V
again, the sites return to the normal color, and Highlighted Sites
Ex 8.3.4: Now to highlight the parsimony-informative,
press the P key, click on the button labeled “Pi” from the shortcut bar
below the menu, or select the Highlight|Parsim-info sites menu command.
To highlight 0, 2, and 4-fold degenerate sites, press the 0, 2,
or 4 keys, respectively, click on the corresponding button from the
shortcut bar below the menu, or select the corresponding command from the
Ex 8.3.5: To compute the nucleotide base frequencies,
select the Statistics|Nucleotide Composition menu command. This will calculate the composition and display
the results of the calculation in a text file using the built-in text editor.
Ex 8.3.6: To compute codon
usage, select the Statistics|Codon Usage menu
command. This will calculate the codon usage and
display the results of the calculation in a text file using the built-in text
Ex 8.3.7: To compute nucleotide pair frequencies,
select the Statistics|Nucleotide Pair
Frequencies|Directional, or the Statistics|Nucleotide Pair Frequencies|Unidirectional menu command.
This will calculate the pair frequencies and display the results of the
calculation in a text file using the built-in text editor.
Ex 8.3.8: To translate these protein-coding sequences
into amino acid sequences and back, press the T key, or select the
Data|Translate/Untranslate menu command from
the Data Explorer menu.
Ex 8.3.9: Once the sequences are translated, calculate
the amino acid composition by selecting the Statistics|Amino Acid Composition menu command from the Data
Ex 8.3.10: To shut down MEGA, select the
File|Exit menu command from the main MEGA
application window and close the data file.