7. Walk through MEGA


This chapter provides a tutorial for using MEGA through 6 examples. The data files for these examples can be found in the C:\MEGA\EXAMPLES directory. In these example files data are deliberately written in different input formats. We recommend that these examples be studied in the order presented because the techniques introduced in previous examples are used in the following ones.

In the following discussion, highlighted words indicate the keys to be pressed on the keyboard. If two keys are required to be pressed simultaneously, they are shown with a + sign between them (e.g., Alt + F3 means that the Alt and F3 keys should be pressed simultaneously). Italicized letters are used to mark the commands available in menus, submenus, and other options as they appear on the computer screen at various times. In every example, we discuss many procedures to introduce the techniques of analysis. For ease of reference in later examples, these procedures are arranged in steps that are numbered in the Ex u.v.w format, where u is the example number, v is the procedure number in the u-th example, and w is the step number in the procedure v. For instance, Ex 1.3.2 refers to the 2nd step of the 3rd procedure in example 1.

7.1 Constructing Trees from Distance Data

This example introduces procedures for changing the default directory, selecting options from menus, opening files in the read-only mode, activating a distance data file for analysis, and building trees from the distance data.

Ex 1.0.1Go to the C:\MEGA directory, type MEGA on the C:\MEGA> DOS prompt, and press Enter.
Ex 1.0.2A Welcome box appears on the screen that displays the current version of MEGA and the names and addresses of the authors.
Ex 1.0.3Press Enter to remove this box from the screen.
Since all the example files are located in the C:\MEGA\EXAMPLES directory, we first set C:\MEGA\EXAMPLES as the current working directory.
Ex 1.1.1Press F10 to go to the main menu. A sliding bar will then appear at the top of the screen. Using the arrow keys (®, ¬), go to the File option and press Enter. The File menu unfolds.
Ex 1.1.2Using the down arrow key ( ¯ ), go to the Change Dir option, and press Enter. The Change Directory dialog box appears. Type C:\MEGA\EXAMPLES, and press Enter. Now with the Tab key, go to the OK button and press Enter. This sets C:\MEGA\EXAMPLES as the current directory.
In this example we will use the data present in the HUMDIST.MEG file. Let us examine the content of this file before proceeding further. Since we do not intend to edit this file, we will use the file browsing command.
Ex 1.2.1The Browse command is present in the File menu. So, first unfold the File menu (follow Ex 1.1.1) and then use the down arrow key ( ¯ ) to go to the Browse option and press Enter. The File Name dialog box appears. Type HUMDIST.MEG, and press Enter.
Ex 1.2.2The HUMDIST.MEG is displayed in a window with a double line border. (Note the presence of two icons on the top border of this window. They are for use with the mouse.) Examine the contents of the file, and note the presence of the #mega format specifier, a title, OTU names, and the lower-left triangular distance matrix.
Ex 1.2.3Now close this file before proceeding for analysis by selecting the Window|Close command.
A data file must be activated before the analysis can be performed. (Remember that opening a file for browsing or editing is different from activating it for analysis.) Let us activate the HUMDIST.MEG data file now.
Ex 1.3.1Press F10 to go to the main menu. Using the arrow keys, move the slide bar to the Data menu and press Enter. The Data menu contains many commands. Use arrow keys to go to the Open Data command and press Enter. A submenu with four options will appear.
Ex 1.3.2Of the four options available, choose the Distance option, and press Enter. A File Name dialog box will appear. Type HUMDIST.MBG, and press Enter.
Ex 1.3.3This produces a Input Data dialog box that inquires about the input distance data format. Using the Tab key select the Lower-left triangular-matrix option, and press Enter. The message "Reading input data. Please Wait!" will appear.
Ex 1.3.4Since this data file does not contain any errors, no error messages are flashed. Do you see any change on the screen? A box labeled Current Data appears that contains information about the input data just activated for analysis. A Selections box also appears on the screen that informs you regarding the current analysis methods chosen.
Let us make a phylogenetic tree from the distance data. For this purpose, you select a tree building method first, and use the Construct Tree(s) command.
Ex 1.4.1The Phylogeny menu contains the Neighbor-joining command . Select this command, and press Enter.
Ex 1.4.2Choose the Construct Tree(s) option from the Phylogeny menu, and press Enter. The message "The tree is being reconstructed. Please Wait!" is displayed.
Ex 1.4.3A neighbor-joining tree is displayed on the screen instantly. Examine the tree on the screen, and press Esc key to remove it.
With this, let us end this session of MEGA.
Ex 1.5.1Go to the Data menu, select the Close Data command, and press Enter. The program inquires if data are to be inactivated. Press Enter, the Current Data and Selections boxes disappear from the screen.
Ex 1.5.2To exit MEGA, press Alt + X or select the Exit MEGA command from the File menu.

7.2 Computing Statistical Quantities for Nucleotide Sequences

In this exercise the use of the Data|Data Presentation command for computing various statistical quantities of nucleotide sequences is illustrated. In addition, short-cuts for frequently used commands, method of accessing on-line helps, and the reason why some commands are enabled and others are disabled are explained.

Ex 2.0.1Go to the C:\MEGA directory first, type MEGA on the C:\MEGA> DOS prompt, and press Enter. Press Enter in the Welcome box that appears on the screen.
Now, set C:\MEGA\EXAMPLES as the default directory (see Ex 1.1.1 - Ex 1.1.2). Let us examine the contents of the file DROSOADH.MEG by using the hot-key for the File|Browse command.
Ex 2.1.1Press F5. This brings up a File Name dialog box. Type DROSOADH.MEG, and press Enter. The distance file will appear on the screen in a double line bordered window. Press F1, the help key, to activate the help, and after a quick glance at the help text, press Esc or click on the filled square icon on the top left corner to put the help window away.
Ex 2.1.2Examination of the DROSOADH.MEG file reveals the presence of the #mega format specifier, a title, OTU names, and the interleaved sequence data.
Ex 2.1.3Let us close this file by pressing Alt + F3 (short-cut for the Window|Close command).
Before activating the data file DROSOADH.MEG for analysis, let us try the Data|Close Data command displayed in the light shade of gray. Can you select this command? No, because no data file is currently active. Isn't the Open Data command displayed in a brighter color? The Open Data command is enabled, but the Close Data command is disabled and is not selectable.

For studying statistical quantities of the data present in the file DROSOADH.MEG, we first activate it.
Ex 2.2.1Select the Data|Open Data command, and choose DNA from the resulting menu. Type DROSOADH.MEG in the File Name box and press Enter.
Ex 2.2.2A dialog box appears where the noninterleaved (continuous) format is selected. Use the Tab and arrow keys to choose the interleaved format. Everything seems alright. Press Enter or click on the OK button.
Ex 2.2.3The message "Reading input data. Please Wait!" appears. Soon after, the program inquires whether the data are protein-coding or not. Press Y to select the Protein-coding mode. For the genetic code table to be used, select the "Universal" option and press Enter. The Current Dota and the Selection boxes appear on the screen.
Now examine the Data menu again. The Close Data command is enabled, displayed in a bright color, and the Open Data command is disabled (try to select any data type in its submenu). The Close Data command is enabled because some data are active, whereas the Open Data command is disabled because it is not possible to activate more than one data set at any one time in MEGA.

Let us take a look at the data by using the Data Presentation command and compute some basic statistics for these data.
Ex 2.3.1Select the Data|Data Presentation command. The message "Sequence data in preparation. Please wait!" appears. The sequences are then displayed on the screen.
Ex 2.3.2DNA sequences are displayed on the screen with the cursor on the first site of the first sequence. Use the right arrow ( ® ) and left arrow ( ¬ ) keys to move from site to site and note a change in the Site# display in the bottom-right corner. Use the up ( ­ ) and down ( ¯ ) arrow keys to move between OTUs and note changes in the OTU Name view on the top panel. The Total Sites view on the bottom panel displays the sequence length at all times and the Marked Sites displays 0 because no special site attributes are marked yet.
Ex 2.3.3To highlight variable sites, press V (or click on the button marked V). All sites that are variable are highlighted, and the number in the Marked Sites display changes. Press V again. The sites return to the normal color and Marked Sites display shows 0 again.
Ex 2.3.4Now to highlight parsimony-informative, and 2- and 4-fold redundant sites. (Read about these buttons by pressing help key F1.)
Ex 2.3.5To compute the nucleotide base frequencies, nucleotide pair frequencies, and the codon usage bias, we use the Statistics command. Press S (or click on the S button). From the dialog box, select the All OTUs option for nucleotide frequencies, nucleotide pair frequencies, and codon usage and the Overlapping option for the Variability by using the Tab and arrow keys, and press Enter. Type C:\NUCSTAT.OUT in the Output File box when the program asks for an output file name.
Ex 2.3.6To examine the statistical quantities computed in the previous example, press Esc to remove the Sequence Data window. Then use the File|Edit|Open File command (F3) to see the C:\NUCSTAT.DAT file.
Ex 2.3.7Now again display the sequence data on the screen by using the Data|Data Presentation command (or use the hot-key F4).
Ex 2.3.8Since the data are in the protein-coding mode, they can be translated into amino acid sequences. To do this, press T. The DNA sequences are now replaced by the amino acid sequences. Note that the commands for highlighting 2- and 4-fold redundant sites are no longer enabled.
Ex 2.3.9Now use the Statistics command to compute the amino acid frequencies. For the output file name, type C:\AMINOSTAT.OUT. Before examining the output from this operation, press T to restore the nucleotide sequences to the screen.
Ex 2.3.10As usuai, press Esc to remove the displayed data and use F5 to examine this file.
To inactivate the currently used data and exit MEGA, press Alt + X. You simply come out of MEGA. Did you realize that we did not inactivate the data file before exiting MEGA? You don't need to do it because it is automatically done by the program.

7.3 Estimating Evolutionary Distances from Nucleotide Sequences

In this example we compute various distances for the Adh sequences from 11 Drosophila species (Thomas and Hunt 1993). We used this data in the previous example to study various sequence statistics. In addition, you will be see how these distances can be written in a file in various formats through options for page size, precision, and relative placement of distances and their standard errors.

Ex 3.0.1Go to the C:\MEGA directory first, type MEGA on the C:\MEGA> DOS prompt, and press Enter. Now again press Enter in the Welcome box that appears on the screen.
As usual, set C:\MEGA\EXAMPLES as the default directory using the File|Change Dir command. Now activate the data file DROSOADH.MEG using the instructions given in Ex 2.2.1 - Ex 2.2.3.

The computation of distances from nucleotide sequences is a two step process. First you need to select an appropriate distance estimation method in the Distance menu, and the distances are then computed by using the Compute Distances command that is also available in the Distance menu.

Now look at the Current Data box present at the lower right corner of the screen. It indicates that the data are being used in the coding mode. At this time, go to the Distance menu (Alt + T), and note that all distance estimation methods in submenus Nucleotide, Syn-nonsynonymous, and Amino Acid are displayed in a bright shade (enabled commands). If you are analyzing noncoding sequences, only the Nucleotide submenu will contain enabled commands, and the Syn-nonsynonymous and Amino Acid submenus will contain disabled commands.

Let us begin by computing the proportion of nucleotide differences between each pair of Adh sequences.
Ex 3.1.1Select the Distance|Nucleotide command (Alt + T, N). From the submenu, select the p-distance. This produces a box with four options (to learn about these options, press F1). Just press Enter to select the default option.
Ex 3.1.2Look at the Selection box on the screen. It shows that you have chosen the p-distance.
Ex 3.1.3Now select the Distance|Compute Distances command. This command will produce a dialog box with many options. At this moment, just press Enter to accept all default options.
Ex 3.1.4The message "Pairwise distances are being estimated. Please wait!" appears. Once all the distances are computed, the program requests a file name to output these distances. For now just type C:\PDIST.OUT.
Ex 3.1.5Use the File|Browse command to examine the distance output file.
Now you know how to compute distances. So let us compute distances using some other methods and compare them with each other.
Ex 3.2.1Select the Distance|Nucleotide command. From the submenu, select the Jukes-Cantor Distance. Now select the Distance|Compute Distances command. Just press Enter to accept all the default options in the resultant dialog box. Once the distances are computed, supply C:\JCDIST.OUT as the file name to write the distances estimated.
Ex 3.2.2Follow the steps Ex 3.1.1- Ex 3.1.3 and compute the Tamura Distance. For the file name, type C:\TAMDIST.OUT.
Ex 3.2.3By this time you have three files containing the distances estimated by three different methods. You can now compare these distances on the screen by pressing the hot-key F5 three times for the three files created above.
Ex 3.2.4For an easy comparison, use the Window|Tile command to arrange multiple files on the screen.
Ex 3.2.5Now remove all these files from the screen by pressing Alt + F3 three times.
The file DROSOADH.MEG contains nucleotide sequence data, and we have computed nucleotide distances from these data. Let us now compute the proportion of amino acid differences. Note that MEGA will automatically translate the nucleotide sequences into amino acid sequences using the selected genetic code table.
Ex 3.3.1Select the Distance|Amino Acid command (Alt + T, A). From the submenu, select the p-distance.
Ex 3.3.2Look at the Selection box on the screen. It shows that you have chosen the amino acid p-distance.
Ex 3.3.3Now select the Distance|Compute Distances command. This command will produce a dialog box with many options. Use the Tab key and go the Estimate option in this dialog box and select Distances and SE's. In this dialog box, note that the Write Distances and Write Standard Errors options show different selections. This means that the distances and their standard errors will be written on the opposite sides of the output matrix. In any case, just press Enter to accept the settings.
Ex 3.3.4The message "Pairwise distances are being estimated. Please wait!" appear. Once all the distances are computed, the program requests a file name to output these distances. For now just type-in C:\PAMINO.OUT.
Ex 3.3.5Use the File|Browse command to examine the distance output file. In contrast to previous files, this file contains both the distances and their standard errors.
In the previous steps, we chose the default option where distances and their standard errors were written on the opposite sides of a matrix, and the distance matrix was fragmented in many parts because it did not fit on one page. Let us write these estimates in the distance ± standard error format in one single matrix.
Ex 3.4.1Look at the Selection box on the screen. It shows that you have chosen the amino acid p-distance. So we do not need to choose the distance estimation method again.
Ex 3.4.2Select the Distance|Compute Distances command. This command will produce a dialog box where Distances and SE's option is already selected. (MEGA remembers your previous selections.) Now go to the Write Standard Errors option with the help of the Tab key and use arrow keys to choose the Upper-right matrix. At this time, the Write Distances and Write Standard Errors options show the same selection. This means that the distances and the standard errors will be written on the same side of the matrix. To write the complete matrix on one page, go to the Page size option by using the Tab key and specify a page size of 1000. Large page sizes ensure that the distance matrix will not be fragmented. Finally, just press Enter to accept the settings.
Ex 3.4.3The message "Pairwise distances are being estimated. Please wait!" appear. Once all the distances are computed, the program requests a file name to output these distances. For now just type-in C:\PAMINO.OUT. Program will enquire whether you want to overwrite the file. Press Enter to say Yes.
Ex 3.4.4Use the File|Browse command to examine the distance output file. In contrast to the previous files, this file contains both the distances and their standard errors in the desired format. Now close this file. Let us inactivate the currently used data set and end the current session of MEGA by pressing the hot-key Alt + X.

7.4 Constructing Trees and Selecting OTUs from Nucleotide Sequences

The CRAB.MEG file contains nucleotide sequences for the large subunit mitochondrial rRNA gene from different crab species (Cunningham et al. 1992). Since the rRNA gene is transcribed but not translated, it is in the category of non-coding genes. Let us use this data file to illustrate the procedures of building trees and in-memory sequence data editing using the commands present in the Data and Phylogeny menus.
Ex 4.0.1Go to the C:\MEGA directory first, type MEGA on the C:\MEGA> DOS prompt, and press Enter. Now again press Enter in the Welcome box that appears on the screen.
Now set C:\MEGA\EXAMPLES as the default directory using the File|Change Dir command, and examine the contents of the CRAB.MEG file (use hot-key F5). In this data file, note the comments starting on the third line. The comments indicate that the data are in the noninterleaved format and that '?' and '-' are used to designate missing-information and alignment gap sites. Close this file using Alt + F3. Let us activate the crab sequence data for analysis.
Ex 4.1.1Select the Data|Open Data command and choose the DNA option. In the File Name dialog box, type CRAB.MEG and press Enter.
Ex 4.1.2A dialog box will appear. Use the Tab key to move around in the dialog box but do not change anything. In this box the noninterleaved format is selected and the symbols used for missing-information data, identical sites, and alignment gap are '?', '.', and '-', respectively. So everything is fine. Just Enter (or click on the OK button).
Ex 4.1.3A status report box informs that the data are being read. At this stage, the program inquires whether the nucleotide sequence data are from a coding or noncoding gene. Select the noncoding mode by pressing the N key. The Current Data and the Selection windows appear on the screen.
The use of Data|Data Presentation command was introduced in the second example. As an exercise, you may try to examine this data set on the screen by using that command. Just press F4, the hot-key for the Data Presentation command, and you will see the data on the screen. For help, press F1 any time.

Let us start by building a neighbor-joining tree. For this purpose, we need to specify a distance estimation method in the Distances menu and a tree building method in the Phylogeny menu. The Phylogeny|Construct Tree(s) command is then used for tree building.
Ex 4.2.1Select the Distance|Nucleotide command. Choose the Jukes-Cantor Distance from the resultant submenu.
Ex 4.2.2To use the neighbor-joining method for tree building, select the Phylogeny|Neighbor-Joining command.
Ex 4.2.3Invoke the Phylogeny|Construct Tree(s) command. This brings a status report box with a message. The neighbor-joining tree will soon appear on the screen.
Ex 4.2.4At this moment, you are automatically put into the phylogenetic-tree editor. This editor provides operations in two modes: view mode and edit mode. The edit mode can be recognized by the presence of a blinking cursor. By default you are placed in the view mode. Press E to enter the edit mode (a blinking cursor will appear).
Ex 4.2.5Use the arrow keys ( ­, ¯, ®, ¬ ) to move to different branches on the tree and note the change in the branch length in the lower-left corner corresponding to the focused branch. Now, position your cursor on the far left corner of the screen.
Ex 4.2.6At this time the cursor assumes a triangular shape instead of the diamond ( ¨ ). Press M, the mirror image of the original tree is displayed instantly. Press M again, the tree reverts to its original shape.
Ex 4.2.7Press the Up arrow key ( ­ ) just once. The cursor moves upwards to the next branch. Press F, the Flip command. A mirror like effect is produced on the sub-tree anchored on the currently focused branch.
Ex 4.2.8The Topology command is to display just the branching pattern of the tree. Press T, the Topology command, the branching pattern (without actual branch lengths) is displayed on the screen. Press T again, the actual NJ tree reappears.
Ex 4.2.9Press F1 to examine the help for tree editor. Use the Tab key to get to the highlighted word Swap and press Enter. You will see information about the Swap command. This can be used for more commands. Press Esc to exit help.
Ex 4.2.10DO NOT remove the tree from the screen. We shall use it for illustrating how a tree can be printed.
At this moment we have the NJ tree on the screen. In MEGA you can print this tree by using a printer. Let us see how.
Ex 4.3.1You can print a tree in two ways. First, a tree can be written as an ASCII-text file. In this case, an exact replica of the tree displayed on the screen is written in the desired file. Since the NJ and UPGMA trees are shown with approximate branch lengths, this output does not reflect true branch lengths. By contrast, if you have a printer attached to your computer, you can print the tree with exact branch lengths.
Ex 4.3.2Press P, the Print command. A dialog box with two options appears. If there is no printer attached to your computer, select the ASCII-Text file output option using the Tab and arrow keys, and then press Enter. For the output file name, type C:\TREE.NJ. If you have a printer attached to your computer, select the Printer option and press Enter. An dialog box appears on the screen. In this dialog box many options are available. (Press F1 to learn about them.)
Ex 4.3.3Do not change anything in this dialog box, and just select the Preview command using the Tab key. A graphic image of the tree will be displayed on the screen. Press Enter, and you are back to the option box. Now go to Write information option, and select the Branch lengths. Again select the Preview command (you may press Alt + V). The tree is now drawn with branch lengths. Press Enter to come out of the graphics image.
Ex 4.3.4To print the tree with a printer, select an appropriate printer using the Printer command.
Ex 4.3.5Press Enter (or click on OK) to print the tree on the selected printer.
Ex 4.3.6Press Esc to exit the phylogenetic-tree editor.
In MEGA, you can also construct maximum parsimony trees. Let us construct a maximum parsimony tree(s) by using the branch-and-bound search option.
Ex 4.4.1Select the Phylogeny|Maximum Parsimony command. In the resultant submenu, choose the Branch-and-Bound Search option.
Ex 4.4.2Invoke the Phylogeny|Construct Tree(s) command, and press Enter to accept default options in the dialog box produced. This brings a status report box. An MP tree appears on the screen as soon as the search is completed.
Ex 4.4.3Note that no branch lengths are given for an MP tree in MEGA. Also that the Topology command is disabled because in this case only the branching pattern is available.
Ex 4.4.4Now print this tree (See Ex 4.3.1 - 4.3.5). You do not have to specify the printer name again because MEGA remembers your selection.
Ex 4.4.5Press Esc to exit the phylogenetic tree editor.
Ex 4.4.6Compare the NJ and MP trees. For this data set, the branching pattern of these two trees is identical.
As an exercise, use the Heuristic Search for finding the MP tree. In this example, you will find the same tree as that obtained by the branch-and-bound method if you use the default option (search factor equal to 2 for all steps of OTU addition). However, the computational time will be much shorter. Actually, in this example even a search factor equal to 0 will recover the MP tree.

We will now examine how some data editing features work in MEGA. For noncoding sequence data, OTUs as well as sites can be selected for analysis. Let us remove the first OTU from the current data set.
Ex 4.5.1Select the Data|Select OTUs command. A Select OTUs list dialog box is displayed.
Ex 4.5.2All the OTU labels are checked ( Ö ) in this box. This indicates that all OTUs are included in the current active data subset. To remove the first OTU from the data, press the Del key (or double click on the first OTU). The first OTU is no longer checked. Press Enter.
Ex 4.5.3Note a change in the Used OTUs entry in the Current Data window. The number of OTUs used for analysis has been reduced by one.
Ex 4.5.4Again use the Data|Data Presentation command (F4) to see the changes made.
Now, construct a neighbor-joining tree from this data set (Ex 4.2.3) that contains 12 OTUs instead of 13. Let us inactivate the currently used data set and end the current session of MEGA by pressing the hot-key Alt + X.

7.5 Tests of the Reliability of a Tree Obtained

In this example, we will conduct two different tests using mitochondrial 12S rRNA gene sequences from 12 flightless birds (ratites) and one related species (Cooper et al. 1992) and learn how to construct a condensed tree.

Ex 5.0.1Go to the C:\MEGA directory first, type MEGA on the C:\MEGA> DOS prompt, and press Enter. Now again press Enter in the Welcome box that appears on the screen.
Set C:\MEGA\EXAMPLES as the default directory and browse through the file RATITE.MEG.

Activate the data present in the RATITE.MEG file by using the Data|Open command and using the default options. This gene does not code for a protein so choose the noncoding mode.

Let us start with the bootstrap test for the neighbor-joining tree. For this purpose, we need to specify a distance estimation method in the Distances menu and a tree building method in the Phylogeny menu. The Phylogeny|Bootstrap Test command is then used for performing a bootstrap test.
Ex 5.1.1Select the Distance|Nucleotide command. Choose the Jukes-Cantor Distance from the resultant menu.
Ex 5.1.2To use the neighbor-joining method for tree building, select the Phylogeny|Neighbor-Joining command.
Ex 5.1.3Invoke the Phylogeny|Bootstrap Test command. This produces a dialog box with many options. Just press Enter. The program will ask about the filename to store some information from bootstrap test. Just press Enter at this time. The test begins, and you can see its progress on the screen. The neighbor-joining tree with bootstrap confidence limits (BCL) appears on the screen in the phylogenetic tree editor.
Ex 5.1.4Press E to go to the Edit mode. A blinking cursor will appear.
Ex 5.1.5Use arrow keys ( ­, ¯, ®, ¬ ) to move to different branches on the tree and note the change in the branch length and BCL values in the lower-left corner.
Ex 5.1.6Let us make a condensed tree. For this purpose, we will use the Cut-Off command. Press O, and you will be asked about a cut~ff level. Type 70 in the box and press Enter. The condensed tree is produced on the screen. This tree shows all the branches that are supported at BCL ³ 70%. Press O again, and the actual NJ tree will reappear.
Ex 5.1.7Print this tree to the printer (see Ex 4.3.1 - Ex 4.3.6) with BCL values selected in the Write information option in the tree printing dialog box.
Ex 5.1.8Press Esc to exit the tree editor.
For neighbor-joining trees, it is possible to conduct the standard error test for every interior branch by using the Phylogeny|Standard Error Test command . In MEGA this test is available for the p-distance, Jukes-Cantor distance, and Kimura's 2-parameter (s + v) distance for nucleotide sequences. Since we did the above analysis for the Jukes-Cantor distance, we will use the same distance estimation method to compare the results from the bootstrap and standard error tests. Since the Selections box shows that Jukes-Cantor distance and NJ tree making method are already selected. We just have to invoke the Phylogeny|Standard Error Test command.
Ex 5.2.1Go to the Phylogeny menu and select the Standard Error Test command. This produces a dialog box that shows that the Complete-Deletion option will be used for missing-information and alignment gap sites. Press Enter to start the test, and you will see its progress on the screen.
Ex 5.2.2The neighbor-joining tree with confidence probabilities (CP) from the standard error test of branch lengths is displayed on the screen.
Ex 5.2.3Compare the CP values on this tree with the BCL values of the tree that you printed in the previous procedure.
Now exit MEGA using the Alt + X command.

7.6 Test of Positive Selection

In this example, various analyses of protein-coding nucleotide sequences for five alleles from the human HLA-A locus (Nei and Hughes 1991) are presented.

Ex 6.0.1Go to the C:\MEGA directory first, type MEGA on the C:\MEGA> DOS prompt, and press Enter. Now again press Enter in the Welcome box that appears on the screen.
Set C:\MEGA\EXAMPLES as the default directory and browse through the file HUMHLA.MEG. In this file, sequences are arranged in the interleaved (block-wise) format. Note that the antigen recognition sites (ARS) are marked in comments. For analyzing the data present in file HUMHLA.MEG, we first activate the data.
Ex 6.1.1Select the Data|Open Data command, and choose DNA from the resulting menu. Type HUMHLA.MEG in the File Name box and press Enter.
Ex 6.1.2A dialog box appears where the noninterleaved (continuous) format is selected. Use the Tab and arrow keys to choose the interleaved format. Everything seems alright. Press Enter or click on the OK button.
Ex 6.1.3The message "Reading the input data file. Please Wait!" appears. Soon after, the program inquires whether the data is protein-coding or not. Press the Y key to select the Protein-coding option. For the genetic code table to be used, select the "Universal" option, and press Enter. The Current Data and the Selection boxes appear on the screen.
Now to study positive Darwinian selection for HLA-A alleles, we need to select all codons that are involved in the antigen recognition sites. These codons are shown with a plus sign ( + ) in the HUMHLA.MEG data file. For this, we need to use the Select Sites/Codons command.
Ex 6.2.1Select the Data|Select Sites/Codons command and choose the Individual option. A Select Codons box appears with a list of codon numbers.
Ex 6.2.2By default all the codons are checked ( Ö ) in this list indicating that all of them are included in the currently active data set. To remove any codon from the data, press the Del key (or double click). Press Del on all numbers except 5, 7, 9, 22, 24, 26, 57, 58, 59, 61-77, 80-82, 84, 95, 97, 99, 114, 116, 143, 145-147, 149-152, 154-159, 161-163, 165-167, 169, and 171. Now press Enter.
Ex 6.2.3Note a change in the Used Codons entry in the Current Data window. This number must be 57. If it is not, go back to Ex 6.2.1 and check.
Ex 6.2.4Now use the Data|Data Presentation command to see the selected data subset. Here you can check if the correct codons are included in the data set or not.
Let us compute the synonymous and nonsynonymous distances appropriate for studying positive Darwinian selection in this set of antigen recognition codons. For this, you must first specify the distance measure and then use the Compute Distances command .
Ex 6.3.1Select the Distance|Syn-Nonsynonymous command. Choose the Jukes-Cantor Correction from the resultant menu. A dialog box appears. Select the Synonymous option.
Ex 6.3.2Now select the Distance|Compute Distances command. In the dialog box, select the Distance and SE's option and also select the Compute overall mean option. We need to do this to obtain all the pairwise synonymous distances and the average synonymous distance and the standard error of this average. (Please read the manual to find the meanings of different options or use the F1 key to get help.) Now press Enter. "Pairwise distances are being estimated. Please Wait" appears.
Ex 6.3.3Once the distances are calculated, an output file name with correct path is required to save the distances. Type C:\SYN.DAT and press Enter. Distances are output to this file. You may use the file browsing command to examine this file. (The average synonymous distance and its standard error should be 0.0618 and 0.0262, respectively.)
Ex 6.3.4Now we need to compute the average nonsynonymous distance and its standard error. For this purpose, we repeat the process shown in Ex 6.3.1 - Ex 6.3.3 but for nonsynonymous distances this time. That is, select the Distance|Syn-Nonsynonymous option, choose the Jukes-Cantor Correction from the resultant menu, and select Nonsynonywous option from the dialog box.
Ex 6.3.5Now select the Distance|Compute Distances command. A dialog box appears. In this dialog box, the Distance and SE's and Compute overall mean options are already selected. Now press Enter. "Pairwise distances are being estimated. Please Wait" appears.
Ex 6.3.6Once the distances are calculated, an output file name with correct path is required to save the distances. Type C:\NONSYN.DAT and press Enter. Distances are output to this file. You may use the file browsing command to examine this file. (The average nonsynonymous difference and its error should be 0.1373 and 0.0231, respectively.)
Ex 6.3.7Now we have estimated the average synonymous and average nonsynonymous substitutions per site and the standard errors of these estimates. To conduct the test, refer to section 4.2 (equation 4.47). The difference in synonymous and nonsynonymous substitutions should come out to be significant at the 5% level. Now exit MEGA using the Alt + X command.


[Next] [Table of Contents]