Importing Data From Other Formats

MEGA supports conversions from several different file formats into MEGA formats. Each format is indicated by the file extension used. Supported formats include:

Extension	File type
. an	CLUSTAL
. nexus	PAUP, MacClade
. phylip	PHYLIP Interleaved
. phylip2	PHYLIP Noninterleaved
. gcg	GCG format
. fasta	FASTA format
. pir	PIR format
. nbrf	NBRF format
. msf	MSF format
. ig	IG format
. xml	Internet (NCBI) XML format

The following sections briefly describe each of these formats and how MEGA handles their conversion.

COMMON FILE CONVERSION ATTRIBUTES

The default input formats are determined by a file’s extension (e.g., a file with the extension of “.ig” is initially assumed to be in “IG” input format). However, you have the option to specify any format for any file; the file extension is simply used as an initial guide. Note that the specification of an incorrect file format most often results in an erroneous conversion or other unexpected error.

Input file types can include any of the following characters in their sequence data:

The letters: a-z,A-Z for DNA and protein sequences

Peroid (.)

Hyphen (-)

The space character

Question mark (?).

Depending on their context, all other characters encountered in input files are either ignored or are interpreted as specific non-sequence data, such as comments, headers, etc.

The first line of all converted files is always: #Mega

The second line of all converted file is always: !Title: <filename>

where <filename> is the name of the input file.

The third line of all converted files is blank.

Many formats can specify the length of the sequences contained within them. The MEGA conversion utility ignores these data and does not check to see if the sequences are as long as they are purported to be.