Converting MSF Format

 

Converting MSF format

The MSF format is an interleaved format that is designed to simplify the comparison of sequences with similar lengths.

 

G006uaah MSF: 240 Type: N Wed Sep 20 12:57:06 MDT 2000 Check: 0 ..

 

Name: G019uabh Len: 400 Check: 0 Weight: 1.00 

Name: G028uaah Len: 268 Check: 0 Weight: 1.00 

Name: G022uabh Len: 257 Check: 0 Weight: 1.00 

Name: G023uabh Len: 347 Check: 0 Weight: 1.00 

Name: G006uaah Len: 240 Check: 0 Weight: 1.00 

//

 

G019uabh ATACATCATA ACACTACTTC CTACCCATAA GCTCCTTTTA ACTTGTTAAA 

G028uaah CATAAGCTCC TTTTAACTTG TTAAAGTCTT GCTTGAATTA AAGACTTGTT 

G022uabh TATTTTAGAG ACCCAAGTTT TTGACCTTTT CCATGTTTAC ATCAATCCTG 

G023uabh AATAAATACC AAAAAAATAG TATATCTACA TAGAATTTCA CATAAAATAA 

G006uaah ACATAAAATA AACTGTTTTC TATGTGAAAA TTAACCTANN ATATGCTTTG 

G019uabh GTCTTGCTTG AATTAAAGAC TTGTTTAAAC ACAAAAATTT AGAGTTTTAC 

G028uaah TAAACACAAA ATTTAGACTT TTACTCAACA AAAGTGATTG ATTGATTGAT 

G022uabh TAGGTGATTG GGCAGCCATT TAAGTATTAT TATAGACATT TTCACTATCC 

G023uabh ACTGTTTTCT ATGTGAAAAT TAACCTAAAA ATATGCTTTG CTTATGTTTA 

G006uaah CTTATGTTTA AGATGTCATG CTTTTTATCA GTTGAGGAGT TCAGCTTAAT 

G019uabh TCAACAAAAG TGATTGATTG ATTGATTGAT TGATTGATGG TTTACAGTAG 

G028uaah TGATTGATTG ATGGTTTACA GTAGGACTTC ATTCTAGTCA TTATAGCTGC 

G022uabh CATTAAAACC CTTTATGCCC ATACATCATA ACACTACTTC CTACCCATAA 

G023uabh AGATGTCATG CTTTTTATCA GTTGAGGAGT TCAGCTTAAT AATCCTCTAC 

G006uaah AATCCTCTAA GATCTTAAAC AAATAGGAAA AAAACTAAAA GTAGAAAATG 

G019uabh GACTTCATTC TAGTCATTAT AGCTGCTGGC AGTATAACTG GCCAGCCTTT 

G028uaah TGGCAGTATA ACTGGCCAGC CTTTAATACA TTGCTGCTTA GAGTCAAAGC 

G022uabh GCTCCTTTTA ACTTGTTAAA GTCTTGCTTG AATTAAAGAC TTGTTTAAAC 

G023uabh GATCTTAAAC AAATAGGAAA AAAACTAAAA GTAGAAAATG GAAATAAAAT 

G006uaah GAAATAAAAT GTCAAAGCAT TTCTACCACT CAGAATTGAT CTTATAACAT 

G019uabh AATACATTGC TGCTTAGAGT CAAAGCATGT ACTTAGAGTT GGTATGATTT 

G028uaah ATGTACTTAG AGTTGGTATG ATTTATCTTT TTGGTCTTCT ATAGCCTCCT 

G022uabh ACAAAATTTA GACTTTTACT CAACAAAAGT GATTGATTGA TTGATTGATT 

G023uabh GTCAAAGCAT TTCTACCACT CAGAATTGAT CTTATAACAT GAAATGCTTT 

G006uaah GAAATGCTTT TTAAAAGAAA ATATTAAAGT TAAACTCCCC 

G019uabh ATCTTTTTGG TCTTCTATAG CCTCCTTCCC CATCCCCATC AGTCTTAATC 

G028uaah TCCCCATCCC ATCAGTCT 

G022uabh GATTGAT 

G023uabh TTAAAAGAAA ATATTAAAGT TAAACTCCCC TATTTTGCTC GTTTTTGCTT 

G019uabh AGTCTTGTTA CGTTATGACT AATCTTTGGG GATTGTGCAG AATGTTATTT 

G023uabh ATCTAAAATA CATTCTGCAC AATCCCCAAA GATTGATCAT ACGTTAC 

G019uabh TAGATAAGCA AAACGAGCAA AATGGGGAGT TACTTATATT TCTTTAAAGC 

 

The MEGA format converter “unravels” the interleaved data by extracting each line beginning with the first name, then those beginning with the second name, and so on, ultimately producing a corresponding file that looks like this:

 

#mega

Title: thisfile.msf

 

#G019uabh

ATACATCATA ACACTACTTC CTACCCATAA GCTCCTTTTA ACTTGTTAAA

GTCTTGCTTG AATTAAAGAC TTGTTTAAAC ACAAAAATTT AGAGTTTTAC

TCAACAAAAG TGATTGATTG ATTGATTGAT TGATTGATGG TTTACAGTAG

GACTTCATTC TAGTCATTAT AGCTGCTGGC AGTATAACTG GCCAGCCTTT

AATACATTGC TGCTTAGAGT CAAAGCATGT ACTTAGAGTT GGTATGATTT

ATCTTTTTGG TCTTCTATAG CCTCCTTCCC CATCCCCATC AGTCTTAATC

AGTCTTGTTA CGTTATGACT AATCTTTGGG GATTGTGCAG AATGTTATTT

TAGATAAGCA AAACGAGCAA AATGGGGAGT TACTTATATT TCTTTAAAGC

 

#G028uaah

CATAAGCTCC TTTTAACTTG TTAAAGTCTT GCTTGAATTA AAGACTTGTT

TAAACACAAA ATTTAGACTT TTACTCAACA AAAGTGATTG ATTGATTGAT

TGATTGATTG ATGGTTTACA GTAGGACTTC ATTCTAGTCA TTATAGCTGC

TGGCAGTATA ACTGGCCAGC CTTTAATACA TTGCTGCTTA GAGTCAAAGC

ATGTACTTAG AGTTGGTATG ATTTATCTTT TTGGTCTTCT ATAGCCTCCT

TCCCCATCCC ATCAGTCT

 

#G022uabh

TATTTTAGAG ACCCAAGTTT TTGACCTTTT CCATGTTTAC ATCAATCCTG

TAGGTGATTG GGCAGCCATT TAAGTATTAT TATAGACATT TTCACTATCC

CATTAAAACC CTTTATGCCC ATACATCATA ACACTACTTC CTACCCATAA

GCTCCTTTTA ACTTGTTAAA GTCTTGCTTG AATTAAAGAC TTGTTTAAAC

ACAAAATTTA GACTTTTACT CAACAAAAGT GATTGATTGA TTGATTGATT

GATTGAT

 

#G023uabh

AATAAATACC AAAAAAATAG TATATCTACA TAGAATTTCA CATAAAATAA

ACTGTTTTCT ATGTGAAAAT TAACCTAAAA ATATGCTTTG CTTATGTTTA

AGATGTCATG CTTTTTATCA GTTGAGGAGT TCAGCTTAAT AATCCTCTAC

GATCTTAAAC AAATAGGAAA AAAACTAAAA GTAGAAAATG GAAATAAAAT

GTCAAAGCAT TTCTACCACT CAGAATTGAT CTTATAACAT GAAATGCTTT

TTAAAAGAAA ATATTAAAGT TAAACTCCCC TATTTTGCTC GTTTTTGCTT

ATCTAAAATA CATTCTGCAC AATCCCCAAA GATTGATCAT ACGTTAC

 

#G006uaah

ACATAAAATA AACTGTTTTC TATGTGAAAA TTAACCTANN ATATGCTTTG

CTTATGTTTA AGATGTCATG CTTTTTATCA GTTGAGGAGT TCAGCTTAAT

AATCCTCTAA GATCTTAAAC AAATAGGAAA AAAACTAAAA GTAGAAAATG

GAAATAAAAT GTCAAAGCAT TTCTACCACT CAGAATTGAT CTTATAACAT

GAAATGCTTT TTAAAAGAAA ATATTAAAGT TAAACTCCCC