Convert GCG Format

 

Converting GCG Format

 

These files consist of one or more groups of non-blank lines separated by one or more blank lines; the non-blank lines look similar to this:

 

Chloroflex

Chloroflex Length: 428 Mon Sep 25 17:34:20 MDT 2000 Check: 0 .. 

1 MSKEHVQTIA TDDVSKNGHT PPTNASTPPY PFVAIVGQAE LKLALLLCVV 

51 NPTIGGVMVM GHRGTAKSTA VRALAAMLPP IKAVAGCPYS CAPDRTAGLC 

101 DQCRALEQQS GKTKKPAVIN IPVPVVDLPL GATEDRVCGT LDIERALTQG 

151 VQAFAPGLLA RANRGFLYID EVNLLEDHLV DVLLDVAASG VNVVEREGVS 

201 VRHPARFVLV GSGNPEEGDL RPQLLDRFGL HARITTITDV SERVEIVKRR 

251 REYDADPFAF VEKWAKETQK LQRKIKQAQR RLPEVILPDP VLYKIAELCV 

301 KLEVDGHRGE LTLARA.ATA LAALEGRNEV TVQDVRRIAV LALRHRLRKD 

351 PLETQD.... ...DAVRIER AVEEVLVP.. .......... .......... 

401 .......... .......... ........ 

 

 

The “Check” tag near the end of a line signifies the first line in a new sequence expression. The name of the sequence is obtained from the preceding line; the following lines, up to the next blank line, are accepted as the sequence. For each line in the sequence, the leading digits are stripped off, and the rest of the line is used. The following shows a conversion of the above sequence.

 

#mega

Title: infile.gcg

 

#Chloroflex

MSKEHVQTIA TDDVSKNGHT PPTNASTPPY PFVAIVGQAE LKLALLLCVV

NPTIGGVMVM GHRGTAKSTA VRALAAMLPP IKAVAGCPYS CAPDRTAGLC

DQCRALEQQS GKTKKPAVIN IPVPVVDLPL GATEDRVCGT LDIERALTQG

VQAFAPGLLA RANRGFLYID EVNLLEDHLV DVLLDVAASG VNVVEREGVS

VRHPARFVLV GSGNPEEGDL RPQLLDRFGL HARITTITDV SERVEIVKRR

REYDADPFAF VEKWAKETQK LQRKIKQAQR RLPEVILPDP VLYKIAELCV

KLEVDGHRGE LTLARA.ATA LAALEGRNEV TVQDVRRIAV LALRHRLRKD

PLETQD.... ...DAVRIER AVEEVLVP.. .......... ..........

.......... .......... ........