Converting MSF format
The MSF format is an interleaved format that is designed to simplify the comparison of sequences with similar lengths.
G006uaah MSF: 240 Type: N Wed Sep 20 12:57:06 MDT 2000 Check: 0 ..
Name: G019uabh Len: 400 Check: 0 Weight: 1.00
Name: G028uaah Len: 268 Check: 0 Weight: 1.00
Name: G022uabh Len: 257 Check: 0 Weight: 1.00
Name: G023uabh Len: 347 Check: 0 Weight: 1.00
Name: G006uaah Len: 240 Check: 0 Weight: 1.00
//
G019uabh ATACATCATA ACACTACTTC CTACCCATAA GCTCCTTTTA ACTTGTTAAA
G028uaah CATAAGCTCC TTTTAACTTG TTAAAGTCTT GCTTGAATTA AAGACTTGTT
G022uabh TATTTTAGAG ACCCAAGTTT TTGACCTTTT CCATGTTTAC ATCAATCCTG
G023uabh AATAAATACC AAAAAAATAG TATATCTACA TAGAATTTCA CATAAAATAA
G006uaah ACATAAAATA AACTGTTTTC TATGTGAAAA TTAACCTANN ATATGCTTTG
G019uabh GTCTTGCTTG AATTAAAGAC TTGTTTAAAC ACAAAAATTT AGAGTTTTAC
G028uaah TAAACACAAA ATTTAGACTT TTACTCAACA AAAGTGATTG ATTGATTGAT
G022uabh TAGGTGATTG GGCAGCCATT TAAGTATTAT TATAGACATT TTCACTATCC
G023uabh ACTGTTTTCT ATGTGAAAAT TAACCTAAAA ATATGCTTTG CTTATGTTTA
G006uaah CTTATGTTTA AGATGTCATG CTTTTTATCA GTTGAGGAGT TCAGCTTAAT
G019uabh TCAACAAAAG TGATTGATTG ATTGATTGAT TGATTGATGG TTTACAGTAG
G028uaah TGATTGATTG ATGGTTTACA GTAGGACTTC ATTCTAGTCA TTATAGCTGC
G022uabh CATTAAAACC CTTTATGCCC ATACATCATA ACACTACTTC CTACCCATAA
G023uabh AGATGTCATG CTTTTTATCA GTTGAGGAGT TCAGCTTAAT AATCCTCTAC
G006uaah AATCCTCTAA GATCTTAAAC AAATAGGAAA AAAACTAAAA GTAGAAAATG
G019uabh GACTTCATTC TAGTCATTAT AGCTGCTGGC AGTATAACTG GCCAGCCTTT
G028uaah TGGCAGTATA ACTGGCCAGC CTTTAATACA TTGCTGCTTA GAGTCAAAGC
G022uabh GCTCCTTTTA ACTTGTTAAA GTCTTGCTTG AATTAAAGAC TTGTTTAAAC
G023uabh GATCTTAAAC AAATAGGAAA AAAACTAAAA GTAGAAAATG GAAATAAAAT
G006uaah GAAATAAAAT GTCAAAGCAT TTCTACCACT CAGAATTGAT CTTATAACAT
G019uabh AATACATTGC TGCTTAGAGT CAAAGCATGT ACTTAGAGTT GGTATGATTT
G028uaah ATGTACTTAG AGTTGGTATG ATTTATCTTT TTGGTCTTCT ATAGCCTCCT
G022uabh ACAAAATTTA GACTTTTACT CAACAAAAGT GATTGATTGA TTGATTGATT
G023uabh GTCAAAGCAT TTCTACCACT CAGAATTGAT CTTATAACAT GAAATGCTTT
G006uaah GAAATGCTTT TTAAAAGAAA ATATTAAAGT TAAACTCCCC
G019uabh ATCTTTTTGG TCTTCTATAG CCTCCTTCCC CATCCCCATC AGTCTTAATC
G028uaah TCCCCATCCC ATCAGTCT
G022uabh GATTGAT
G023uabh TTAAAAGAAA ATATTAAAGT TAAACTCCCC TATTTTGCTC GTTTTTGCTT
G019uabh AGTCTTGTTA CGTTATGACT AATCTTTGGG GATTGTGCAG AATGTTATTT
G023uabh ATCTAAAATA CATTCTGCAC AATCCCCAAA GATTGATCAT ACGTTAC
G019uabh TAGATAAGCA AAACGAGCAA AATGGGGAGT TACTTATATT TCTTTAAAGC
The MEGA format converter “unravels” the interleaved data by extracting each line beginning with the first name, then those beginning with the second name, and so on, ultimately producing a corresponding file that looks like this:
#mega
Title: thisfile.msf
#G019uabh
ATACATCATA ACACTACTTC CTACCCATAA GCTCCTTTTA ACTTGTTAAA
GTCTTGCTTG AATTAAAGAC TTGTTTAAAC ACAAAAATTT AGAGTTTTAC
TCAACAAAAG TGATTGATTG ATTGATTGAT TGATTGATGG TTTACAGTAG
GACTTCATTC TAGTCATTAT AGCTGCTGGC AGTATAACTG GCCAGCCTTT
AATACATTGC TGCTTAGAGT CAAAGCATGT ACTTAGAGTT GGTATGATTT
ATCTTTTTGG TCTTCTATAG CCTCCTTCCC CATCCCCATC AGTCTTAATC
AGTCTTGTTA CGTTATGACT AATCTTTGGG GATTGTGCAG AATGTTATTT
TAGATAAGCA AAACGAGCAA AATGGGGAGT TACTTATATT TCTTTAAAGC
#G028uaah
CATAAGCTCC TTTTAACTTG TTAAAGTCTT GCTTGAATTA AAGACTTGTT
TAAACACAAA ATTTAGACTT TTACTCAACA AAAGTGATTG ATTGATTGAT
TGATTGATTG ATGGTTTACA GTAGGACTTC ATTCTAGTCA TTATAGCTGC
TGGCAGTATA ACTGGCCAGC CTTTAATACA TTGCTGCTTA GAGTCAAAGC
ATGTACTTAG AGTTGGTATG ATTTATCTTT TTGGTCTTCT ATAGCCTCCT
TCCCCATCCC ATCAGTCT
#G022uabh
TATTTTAGAG ACCCAAGTTT TTGACCTTTT CCATGTTTAC ATCAATCCTG
TAGGTGATTG GGCAGCCATT TAAGTATTAT TATAGACATT TTCACTATCC
CATTAAAACC CTTTATGCCC ATACATCATA ACACTACTTC CTACCCATAA
GCTCCTTTTA ACTTGTTAAA GTCTTGCTTG AATTAAAGAC TTGTTTAAAC
ACAAAATTTA GACTTTTACT CAACAAAAGT GATTGATTGA TTGATTGATT
GATTGAT
#G023uabh
AATAAATACC AAAAAAATAG TATATCTACA TAGAATTTCA CATAAAATAA
ACTGTTTTCT ATGTGAAAAT TAACCTAAAA ATATGCTTTG CTTATGTTTA
AGATGTCATG CTTTTTATCA GTTGAGGAGT TCAGCTTAAT AATCCTCTAC
GATCTTAAAC AAATAGGAAA AAAACTAAAA GTAGAAAATG GAAATAAAAT
GTCAAAGCAT TTCTACCACT CAGAATTGAT CTTATAACAT GAAATGCTTT
TTAAAAGAAA ATATTAAAGT TAAACTCCCC TATTTTGCTC GTTTTTGCTT
ATCTAAAATA CATTCTGCAC AATCCCCAAA GATTGATCAT ACGTTAC
#G006uaah
ACATAAAATA AACTGTTTTC TATGTGAAAA TTAACCTANN ATATGCTTTG
CTTATGTTTA AGATGTCATG CTTTTTATCA GTTGAGGAGT TCAGCTTAAT
AATCCTCTAA GATCTTAAAC AAATAGGAAA AAAACTAAAA GTAGAAAATG
GAAATAAAAT GTCAAAGCAT TTCTACCACT CAGAATTGAT CTTATAACAT
GAAATGCTTT TTAAAAGAAA ATATTAAAGT TAAACTCCCC