Converting MSF Format

 

Converting MSF format

The MSF format is an interleaved format that is designed to simplify the comparison of sequences with similar lengths.

 

G006uaah MSF: 240 Type: N Wed Sep 20 12:57:06 MDT 2000 Check: 0 ..

 

Name: G019uabh Len: 400 Check: 0 Weight: 1.00

Name: G028uaah Len: 268 Check: 0 Weight: 1.00

Name: G022uabh Len: 257 Check: 0 Weight: 1.00

Name: G023uabh Len: 347 Check: 0 Weight: 1.00

Name: G006uaah Len: 240 Check: 0 Weight: 1.00

//

 

G019uabh ATACATCATA ACACTACTTC CTACCCATAA GCTCCTTTTA ACTTGTTAAA

G028uaah CATAAGCTCC TTTTAACTTG TTAAAGTCTT GCTTGAATTA AAGACTTGTT

G022uabh TATTTTAGAG ACCCAAGTTT TTGACCTTTT CCATGTTTAC ATCAATCCTG

G023uabh AATAAATACC AAAAAAATAG TATATCTACA TAGAATTTCA CATAAAATAA

G006uaah ACATAAAATA AACTGTTTTC TATGTGAAAA TTAACCTANN ATATGCTTTG

G019uabh GTCTTGCTTG AATTAAAGAC TTGTTTAAAC ACAAAAATTT AGAGTTTTAC

G028uaah TAAACACAAA ATTTAGACTT TTACTCAACA AAAGTGATTG ATTGATTGAT

G022uabh TAGGTGATTG GGCAGCCATT TAAGTATTAT TATAGACATT TTCACTATCC

G023uabh ACTGTTTTCT ATGTGAAAAT TAACCTAAAA ATATGCTTTG CTTATGTTTA

G006uaah CTTATGTTTA AGATGTCATG CTTTTTATCA GTTGAGGAGT TCAGCTTAAT

G019uabh TCAACAAAAG TGATTGATTG ATTGATTGAT TGATTGATGG TTTACAGTAG

G028uaah TGATTGATTG ATGGTTTACA GTAGGACTTC ATTCTAGTCA TTATAGCTGC

G022uabh CATTAAAACC CTTTATGCCC ATACATCATA ACACTACTTC CTACCCATAA

G023uabh AGATGTCATG CTTTTTATCA GTTGAGGAGT TCAGCTTAAT AATCCTCTAC

G006uaah AATCCTCTAA GATCTTAAAC AAATAGGAAA AAAACTAAAA GTAGAAAATG

G019uabh GACTTCATTC TAGTCATTAT AGCTGCTGGC AGTATAACTG GCCAGCCTTT

G028uaah TGGCAGTATA ACTGGCCAGC CTTTAATACA TTGCTGCTTA GAGTCAAAGC

G022uabh GCTCCTTTTA ACTTGTTAAA GTCTTGCTTG AATTAAAGAC TTGTTTAAAC

G023uabh GATCTTAAAC AAATAGGAAA AAAACTAAAA GTAGAAAATG GAAATAAAAT

G006uaah GAAATAAAAT GTCAAAGCAT TTCTACCACT CAGAATTGAT CTTATAACAT

G019uabh AATACATTGC TGCTTAGAGT CAAAGCATGT ACTTAGAGTT GGTATGATTT

G028uaah ATGTACTTAG AGTTGGTATG ATTTATCTTT TTGGTCTTCT ATAGCCTCCT

G022uabh ACAAAATTTA GACTTTTACT CAACAAAAGT GATTGATTGA TTGATTGATT

G023uabh GTCAAAGCAT TTCTACCACT CAGAATTGAT CTTATAACAT GAAATGCTTT

G006uaah GAAATGCTTT TTAAAAGAAA ATATTAAAGT TAAACTCCCC

G019uabh ATCTTTTTGG TCTTCTATAG CCTCCTTCCC CATCCCCATC AGTCTTAATC

G028uaah TCCCCATCCC ATCAGTCT

G022uabh GATTGAT

G023uabh TTAAAAGAAA ATATTAAAGT TAAACTCCCC TATTTTGCTC GTTTTTGCTT

G019uabh AGTCTTGTTA CGTTATGACT AATCTTTGGG GATTGTGCAG AATGTTATTT

G023uabh ATCTAAAATA CATTCTGCAC AATCCCCAAA GATTGATCAT ACGTTAC

G019uabh TAGATAAGCA AAACGAGCAA AATGGGGAGT TACTTATATT TCTTTAAAGC

 

The MEGA format converter “unravels” the interleaved data by extracting each line beginning with the first name, then those beginning with the second name, and so on, ultimately producing a corresponding file that looks like this:

 

#mega

Title: thisfile.msf

 

#G019uabh

ATACATCATA ACACTACTTC CTACCCATAA GCTCCTTTTA ACTTGTTAAA

GTCTTGCTTG AATTAAAGAC TTGTTTAAAC ACAAAAATTT AGAGTTTTAC

TCAACAAAAG TGATTGATTG ATTGATTGAT TGATTGATGG TTTACAGTAG

GACTTCATTC TAGTCATTAT AGCTGCTGGC AGTATAACTG GCCAGCCTTT

AATACATTGC TGCTTAGAGT CAAAGCATGT ACTTAGAGTT GGTATGATTT

ATCTTTTTGG TCTTCTATAG CCTCCTTCCC CATCCCCATC AGTCTTAATC

AGTCTTGTTA CGTTATGACT AATCTTTGGG GATTGTGCAG AATGTTATTT

TAGATAAGCA AAACGAGCAA AATGGGGAGT TACTTATATT TCTTTAAAGC

 

#G028uaah

CATAAGCTCC TTTTAACTTG TTAAAGTCTT GCTTGAATTA AAGACTTGTT

TAAACACAAA ATTTAGACTT TTACTCAACA AAAGTGATTG ATTGATTGAT

TGATTGATTG ATGGTTTACA GTAGGACTTC ATTCTAGTCA TTATAGCTGC

TGGCAGTATA ACTGGCCAGC CTTTAATACA TTGCTGCTTA GAGTCAAAGC

ATGTACTTAG AGTTGGTATG ATTTATCTTT TTGGTCTTCT ATAGCCTCCT

TCCCCATCCC ATCAGTCT

 

#G022uabh

TATTTTAGAG ACCCAAGTTT TTGACCTTTT CCATGTTTAC ATCAATCCTG

TAGGTGATTG GGCAGCCATT TAAGTATTAT TATAGACATT TTCACTATCC

CATTAAAACC CTTTATGCCC ATACATCATA ACACTACTTC CTACCCATAA

GCTCCTTTTA ACTTGTTAAA GTCTTGCTTG AATTAAAGAC TTGTTTAAAC

ACAAAATTTA GACTTTTACT CAACAAAAGT GATTGATTGA TTGATTGATT

GATTGAT

 

#G023uabh

AATAAATACC AAAAAAATAG TATATCTACA TAGAATTTCA CATAAAATAA

ACTGTTTTCT ATGTGAAAAT TAACCTAAAA ATATGCTTTG CTTATGTTTA

AGATGTCATG CTTTTTATCA GTTGAGGAGT TCAGCTTAAT AATCCTCTAC

GATCTTAAAC AAATAGGAAA AAAACTAAAA GTAGAAAATG GAAATAAAAT

GTCAAAGCAT TTCTACCACT CAGAATTGAT CTTATAACAT GAAATGCTTT

TTAAAAGAAA ATATTAAAGT TAAACTCCCC TATTTTGCTC GTTTTTGCTT

ATCTAAAATA CATTCTGCAC AATCCCCAAA GATTGATCAT ACGTTAC

 

#G006uaah

ACATAAAATA AACTGTTTTC TATGTGAAAA TTAACCTANN ATATGCTTTG

CTTATGTTTA AGATGTCATG CTTTTTATCA GTTGAGGAGT TCAGCTTAAT

AATCCTCTAA GATCTTAAAC AAATAGGAAA AAAACTAAAA GTAGAAAATG

GAAATAAAAT GTCAAAGCAT TTCTACCACT CAGAATTGAT CTTATAACAT

GAAATGCTTT TTAAAAGAAA ATATTAAAGT TAAACTCCCC