Converting MSF Format

These files consist of one or more groups of non-blank lines separated by one or more blank lines. The following is an example of the non-blank lines:

 

 ;G028uaah 240 bases

 G028uaah

 CATAAGCTCCTTTTAACTTGTTAAAGTCTTGCTTGAATTAAAGACTTGTT

 TAAACACAAAATTTAGACTTTTACTCAACAAAAGTGATTGATTGATTGAT

 

The first line in each group begins with a semicolon. This line is ignored by MEGA. The following line (e.g., G028uaah above) is treated as the name of the sequence. Subsequent lines, until the next semicolon, are taken as the sequence. MEGA recognizes the letters a-z and A-Z for DNA and protein sequences and only a few special characters, such as period [.], hyphen [-], space, and question mark [?]. Depending on their context, all other characters in the input files are either ignored or are interpreted as specific non-sequence data, such as comments, headers, etc.

 

The example converts to MEGA file format as follows:

 

 #mega

 !Title: filename

 #G019uabh

 ATACATCATAACACTACTTCCTACCCATAAGCTCCTTTTAACTTGTTAAA

 GTCTTGCTTGAATTAAAGACTTGTTTAAACACAAAAATTTAGAGTTTTAC