Importing Data From Other Formats

MEGA supports conversions from several different file formats into MEGA formats. Each format is indicated by the file extension used. Supported formats include:

 

Extension

File type

. an

CLUSTAL

. nexus

PAUP, MacClade

. phylip

PHYLIP Interleaved

. phylip2

PHYLIP Noninterleaved

. gcg

GCG format

. fasta

FASTA format

. pir

PIR format

. nbrf

NBRF format

. msf

MSF format

. ig

IG format

. xml

Internet (NCBI) XML format

 

The following sections briefly describe each of these formats and how MEGA handles their conversion.

 

COMMON FILE CONVERSION ATTRIBUTES

 

The default input formats are determined by a file’s extension (e.g., a file with the extension of ".ig" is initially assumed to be in "IG" input format). However, you have the option to specify any format for any file; the file extension is simply used as an initial guide. Note that the specification of an incorrect file format most often results in an erroneous conversion or other unexpected error.

 

Input file types can include any of the following characters in their sequence data:

·      The letters: a-z,A-Z for DNA and protein sequences

·      Peroid (.)

·      Hyphen (-)

·      The space character

·      Question mark (?).

 

Depending on their context, all other characters encountered in input files are either ignored or are interpreted as specific non-sequence data, such as comments, headers, etc.

 

The first line of all converted files is always: #Mega

The second line of all converted file is always: !Title: <filename>

where <filename> is the name of the input file.

The third line of all converted files is blank.

 

Many formats can specify the length of the sequences contained within them. The MEGA conversion utility ignores these data and does not check to see if the sequences are as long as they are purported to be.