Search code examples
pythonpandasdataframetext-processingbiopython

How to insert sequence at specific position?


I would like to insert a specific sequence at a defined position in a FASTA formatted file for multiple sequences, where the modified sequences would be output in a single file.

I have tried the following commands:

I can print the records using the code below, but I cannot insert seq at the position. I can only export fixed record data.

from Bio import SeqIO
import pandas as pd

output_handle2 = open("new_fasta2.fasta", "a")

records1 = SeqIO.index("file_test.fa", "fasta")
candidate_df=pd.read_csv("file_test.csv")
for i in candidate_df['refseq']:

    if i in records1:
    print(">" + records1[i].id + "_" + "\n" + records1[i].seq)
    SeqIO.write(records1[i], output_handle2, 'fasta')

The code below prints the record and inserted sequence for only one position (column 3).

temp = {}
for line in open("file_test.csv","r"):
    i, c, d = line.strip().split(',')
    temp[i] = c
    temp[i] = d

for rec in SeqIO.parse("file_test.fa", "fasta"):
    if str(rec.id) in temp.keys():
        print(">" + str(rec.id) + "_" + temp[rec.id])
        a = temp[rec.id]
        b = int(len(rec)) - int(a)
        print(str(rec.seq[:len(rec) - int(a)] + "_sequence_" + rec.seq[b:]))

FASTA formatted file

>NM_030649
MTVEFEECVKDSPRFRATIDEVETDVVEIEAKLDKLVKLCSGMVEAGKAYVSTSRLFVSGVRDLSQQCQGDTVISECLQRFADSLQEVVNYHMILFDQAQRSVRQQLQSFVKEDVRKFKETKKQFDKVREDLELSLVRNAQAPRHRPHEVEEATGALTLTRKCFRHLALDYVLQINVLQAKKKFEILDSMLSFMHAQSSFFQQGYSLLHQLDPYMKKLAAELDQLVIDSAVEKREMERKHAAIQQRTLLQDFSYDESKVEFDVDAPSGVVMEGYLFKRASNAFKTWNRRWFSIQNSQLVYQKKLKDALTVVVDDLRLCSVKPCEDIERRFCFEVLSPTKSCMLQADSEKLRQAWVQAVQASIASAYRESPDSCYSERLDRTASPSTSSIDSATDTRERGVKGESVLQRVQSVAGNSQCGDCGQPDPRWASINLGVLLCIECSGIHRSLGVHCSKVRSLTLDSWEPELLKLMCELGNSAVNQIYEAQCEGAGSRKPTASSSRQDKEAWIKDKYVEKKFLRKAPMAPALEAPRRWRVQKCLRPHSSPRAPTARRKVRLEPVLPCVAALSSVGTLDRKFRRDSLFCPDELDSLFSYFDAGAAGAGPRSLSSDSGLGGSSDGSSDVLAFGSGSVVDSVTEEEGAESEESSGEADGDTEAEAWGLADVRELHPGLLAHRAARARDLPALAAALAHGAEVNWADAEDEGKTPLVQAVLGGSLIVCEFLLQNGADVNQRDSRGRAPLHHATLLGRTGQVCLFLKRGADQHALDQEQRDPLAIAVQAANADIVTLLRLARMAEEMREAEAAPGPPGALAGSPTELQFRRCIQEFISLHLEES
>NM_001256456
MCGAGFGHFEWLAGGGAGQDVGRSCILVSIAGKNVMLDCGMHMGFNDDRRFPDFSYITQNGRLTDFLDCVIISHFHLDHCGALPYFSEMVGYDGPIYMTHPTQAICPILLEDYRKIAVDKKGEANFFTSQMIKDCMKKVVAVHLHQTVQVDDELEIKAYYAGHVLGAAMFQIKVGSESVVYTGDYNMTPDRHLGAAWIDKCRPNLLITESTYATTIRDSKRCRERDFLKKVHETVERGGKVLIPVFALGRAQELCILLETFWERMNLKVPIYFSTGLTEKANHYYKLFIPWTNQKIRKTFVQRNMFEFKHIKAFDRAFADNPGPMVVFATPGMLHAGQSLQIFRKWAGNEKNMVIMPGYCVQGTVGHKILSGQRKLEMEGRQVLEVKMQVEYMSFSAHADAKGIMQLVGQAEPESVLLVHGEAKKMEFLKQKIEQELRVNCYMPANGETVTLPTSPSIPVGISLGLLKREMAQGLLPEAKKPRLLHGTLIMKDSNFRLVSSEQALKELGLAEHQLRFTCRVHLHDTRKEQETALRVYSHLKSVLKDHCVQHLPDGSVTVESVLLQAAAPSEDPGTKVLLVSWTYQDEELGSFLTSLLKKGLPQAPS

Positions where to insert sequence

column 1: label
column 2: 2nd part of label
column 3: position

NM_030649   1   33
NM_030649   2   69
NM_001256456    1   91
NM_001256456    2   202

custom sequence to insert - I have indicated a lower case sequence here to easily visualize in example below, but final sequence will be upper case.

sequence

Example output

>NM_030649_1
MTVEFEECVKDSPRFRATIDEVETDVVEIEAKLsequenceDKLVKLCSGMVEAGKAYVSTSRLFVSGVRDLSQQCQGDTVISECLQRFADSLQEVVNYHMILFDQAQRSVRQQLQSFVKEDVRKFKETKKQFDKVREDLELSLVRNAQAPRHRPHEVEEATGALTLTRKCFRHLALDYVLQINVLQAKKKFEILDSMLSFMHAQSSFFQQGYSLLHQLDPYMKKLAAELDQLVIDSAVEKREMERKHAAIQQRTLLQDFSYDESKVEFDVDAPSGVVMEGYLFKRASNAFKTWNRRWFSIQNSQLVYQKKLKDALTVVVDDLRLCSVKPCEDIERRFCFEVLSPTKSCMLQADSEKLRQAWVQAVQASIASAYRESPDSCYSERLDRTASPSTSSIDSATDTRERGVKGESVLQRVQSVAGNSQCGDCGQPDPRWASINLGVLLCIECSGIHRSLGVHCSKVRSLTLDSWEPELLKLMCELGNSAVNQIYEAQCEGAGSRKPTASSSRQDKEAWIKDKYVEKKFLRKAPMAPALEAPRRWRVQKCLRPHSSPRAPTARRKVRLEPVLPCVAALSSVGTLDRKFRRDSLFCPDELDSLFSYFDAGAAGAGPRSLSSDSGLGGSSDGSSDVLAFGSGSVVDSVTEEEGAESEESSGEADGDTEAEAWGLADVRELHPGLLAHRAARARDLPALAAALAHGAEVNWADAEDEGKTPLVQAVLGGSLIVCEFLLQNGADVNQRDSRGRAPLHHATLLGRTGQVCLFLKRGADQHALDQEQRDPLAIAVQAANADIVTLLRLARMAEEMREAEAAPGPPGALAGSPTELQFRRCIQEFISLHLEES
>NM_030649_2
MTVEFEECVKDSPRFRATIDEVETDVVEIEAKLDKLVKLCSGMVEAGKAYVSTSRLFVSGVRDLSQQCQsequenceGDTVISECLQRFADSLQEVVNYHMILFDQAQRSVRQQLQSFVKEDVRKFKETKKQFDKVREDLELSLVRNAQAPRHRPHEVEEATGALTLTRKCFRHLALDYVLQINVLQAKKKFEILDSMLSFMHAQSSFFQQGYSLLHQLDPYMKKLAAELDQLVIDSAVEKREMERKHAAIQQRTLLQDFSYDESKVEFDVDAPSGVVMEGYLFKRASNAFKTWNRRWFSIQNSQLVYQKKLKDALTVVVDDLRLCSVKPCEDIERRFCFEVLSPTKSCMLQADSEKLRQAWVQAVQASIASAYRESPDSCYSERLDRTASPSTSSIDSATDTRERGVKGESVLQRVQSVAGNSQCGDCGQPDPRWASINLGVLLCIECSGIHRSLGVHCSKVRSLTLDSWEPELLKLMCELGNSAVNQIYEAQCEGAGSRKPTASSSRQDKEAWIKDKYVEKKFLRKAPMAPALEAPRRWRVQKCLRPHSSPRAPTARRKVRLEPVLPCVAALSSVGTLDRKFRRDSLFCPDELDSLFSYFDAGAAGAGPRSLSSDSGLGGSSDGSSDVLAFGSGSVVDSVTEEEGAESEESSGEADGDTEAEAWGLADVRELHPGLLAHRAARARDLPALAAALAHGAEVNWADAEDEGKTPLVQAVLGGSLIVCEFLLQNGADVNQRDSRGRAPLHHATLLGRTGQVCLFLKRGADQHALDQEQRDPLAIAVQAANADIVTLLRLARMAEEMREAEAAPGPPGALAGSPTELQFRRCIQEFISLHLEES
>NM_001256456_1
MCGAGFGHFEWLAGGGAGQDVGRSCILVSIAGKNVMLDCGMHMGFNDDRRFPDFSYITQNGRLTDFLDCVIISHFHLDHCGALPYFSEMVGsequenceYDGPIYMTHPTQAICPILLEDYRKIAVDKKGEANFFTSQMIKDCMKKVVAVHLHQTVQVDDELEIKAYYAGHVLGAAMFQIKVGSESVVYTGDYNMTPDRHLGAAWIDKCRPNLLITESTYATTIRDSKRCRERDFLKKVHETVERGGKVLIPVFALGRAQELCILLETFWERMNLKVPIYFSTGLTEKANHYYKLFIPWTNQKIRKTFVQRNMFEFKHIKAFDRAFADNPGPMVVFATPGMLHAGQSLQIFRKWAGNEKNMVIMPGYCVQGTVGHKILSGQRKLEMEGRQVLEVKMQVEYMSFSAHADAKGIMQLVGQAEPESVLLVHGEAKKMEFLKQKIEQELRVNCYMPANGETVTLPTSPSIPVGISLGLLKREMAQGLLPEAKKPRLLHGTLIMKDSNFRLVSSEQALKELGLAEHQLRFTCRVHLHDTRKEQETALRVYSHLKSVLKDHCVQHLPDGSVTVESVLLQAAAPSEDPGTKVLLVSWTYQDEELGSFLTSLLKKGLPQAPS
>NM_001256456_2
MCGAGFGHFEWLAGGGAGQDVGRSCILVSIAGKNVMLDCGMHMGFNDDRRFPDFSYITQNGRLTDFLDCVIISHFHLDHCGALPYFSEMVGYDGPIYMTHPTQAICPILLEDYRKIAVDKKGEANFFTSQMIKDCMKKVVAVHLHQTVQVDDELEIKAYYAGHVLGAAMFQIKVGSESVVYTGDYNMTPDRHLGAAWIDKCRsequencePNLLITESTYATTIRDSKRCRERDFLKKVHETVERGGKVLIPVFALGRAQELCILLETFWERMNLKVPIYFSTGLTEKANHYYKLFIPWTNQKIRKTFVQRNMFEFKHIKAFDRAFADNPGPMVVFATPGMLHAGQSLQIFRKWAGNEKNMVIMPGYCVQGTVGHKILSGQRKLEMEGRQVLEVKMQVEYMSFSAHADAKGIMQLVGQAEPESVLLVHGEAKKMEFLKQKIEQELRVNCYMPANGETVTLPTSPSIPVGISLGLLKREMAQGLLPEAKKPRLLHGTLIMKDSNFRLVSSEQALKELGLAEHQLRFTCRVHLHDTRKEQETALRVYSHLKSVLKDHCVQHLPDGSVTVESVLLQAAAPSEDPGTKVLLVSWTYQDEELGSFLTSLLKKGLPQAPS

Solution

  • second attempt:

    csv file, file_test.csv :

    NM_030649,   1,   33
    NM_030649,   2,   69
    NM_001256456,    1,  91
    NM_001256456,    2,   202
    

    fasta file, file_test.fa :

    >NM_030649
    MTVEFEECVKDSPRFRATIDEVETDVVEIEAKLDKLVKLCSGMVEAGKAYVSTSRLFVSGVRDLSQQCQGDTVISECLQRFADSLQEVVNYHMILFDQAQRSVRQQLQSFVKEDVRKFKETKKQFDKVREDLELSLVRNAQAPRHRPHEVEEATGALTLTRKCFRHLALDYVLQINVLQAKKKFEILDSMLSFMHAQSSFFQQGYSLLHQLDPYMKKLAAELDQLVIDSAVEKREMERKHAAIQQRTLLQDFSYDESKVEFDVDAPSGVVMEGYLFKRASNAFKTWNRRWFSIQNSQLVYQKKLKDALTVVVDDLRLCSVKPCEDIERRFCFEVLSPTKSCMLQADSEKLRQAWVQAVQASIASAYRESPDSCYSERLDRTASPSTSSIDSATDTRERGVKGESVLQRVQSVAGNSQCGDCGQPDPRWASINLGVLLCIECSGIHRSLGVHCSKVRSLTLDSWEPELLKLMCELGNSAVNQIYEAQCEGAGSRKPTASSSRQDKEAWIKDKYVEKKFLRKAPMAPALEAPRRWRVQKCLRPHSSPRAPTARRKVRLEPVLPCVAALSSVGTLDRKFRRDSLFCPDELDSLFSYFDAGAAGAGPRSLSSDSGLGGSSDGSSDVLAFGSGSVVDSVTEEEGAESEESSGEADGDTEAEAWGLADVRELHPGLLAHRAARARDLPALAAALAHGAEVNWADAEDEGKTPLVQAVLGGSLIVCEFLLQNGADVNQRDSRGRAPLHHATLLGRTGQVCLFLKRGADQHALDQEQRDPLAIAVQAANADIVTLLRLARMAEEMREAEAAPGPPGALAGSPTELQFRRCIQEFISLHLEES
    >NM_001256456
    MCGAGFGHFEWLAGGGAGQDVGRSCILVSIAGKNVMLDCGMHMGFNDDRRFPDFSYITQNGRLTDFLDCVIISHFHLDHCGALPYFSEMVGYDGPIYMTHPTQAICPILLEDYRKIAVDKKGEANFFTSQMIKDCMKKVVAVHLHQTVQVDDELEIKAYYAGHVLGAAMFQIKVGSESVVYTGDYNMTPDRHLGAAWIDKCRPNLLITESTYATTIRDSKRCRERDFLKKVHETVERGGKVLIPVFALGRAQELCILLETFWERMNLKVPIYFSTGLTEKANHYYKLFIPWTNQKIRKTFVQRNMFEFKHIKAFDRAFADNPGPMVVFATPGMLHAGQSLQIFRKWAGNEKNMVIMPGYCVQGTVGHKILSGQRKLEMEGRQVLEVKMQVEYMSFSAHADAKGIMQLVGQAEPESVLLVHGEAKKMEFLKQKIEQELRVNCYMPANGETVTLPTSPSIPVGISLGLLKREMAQGLLPEAKKPRLLHGTLIMKDSNFRLVSSEQALKELGLAEHQLRFTCRVHLHDTRKEQETALRVYSHLKSVLKDHCVQHLPDGSVTVESVLLQAAAPSEDPGTKVLLVSWTYQDEELGSFLTSLLKKGLPQAPS
    
    

    my code:

    from Bio import SeqIO
    
    from Bio.Seq import Seq
    
    
    # import pandas as pd
    
    sequence = "_sequence_"
    
    temp = {}
    for line in open("file_test.csv","r"):
        # print(line)
        i, c, d = line.strip().replace(" ", "").split(',')
        
        # print(i,c,d, '\n\n')
        temp[i+'_'+c] = d
        # temp[i] = d # ------> removed
        
        
    print(temp,'\n')
    
    with open("result.fa", "w") as handle:
            
        for label in temp.keys():
            # print(label,'  ',label.rsplit('_', 1)[0])
            if label.rsplit('_', 1)[0] in [rec.id for rec in SeqIO.parse("file_test.fa", "fasta")]:
                for rec in SeqIO.parse("file_test.fa", "fasta"):
                    if rec.id == label.rsplit('_', 1)[0]:
                        
                        print('record :')
                        print(rec)
                        print('..................')
                        
                        print(label, len(rec.seq), len(sequence))
                        
                        print(rec.seq[:int(temp[label])] + sequence 
                                              + rec.seq[int(temp[label]):])
                        
                        rec.id = label
                        
                        rec.name = ''
                        
                        rec.description = ''
                        
                        rec.seq = Seq(str(rec.seq[:int(temp[label])] + sequence
                                                          + rec.seq[int(temp[label]):]))
                        
                        print(len(rec.seq), len(sequence),'\n')
    
                        SeqIO.write(rec, handle, "fasta")
                        print('_______________________')
    
    

    output, printed not saved:

    {'NM_030649_1': '33', 'NM_030649_2': '69', 'NM_001256456_1': '91', 'NM_001256456_2': '202'} 
    
    record :
    ID: NM_030649
    Name: NM_030649
    Description: NM_030649
    Number of features: 0
    Seq('MTVEFEECVKDSPRFRATIDEVETDVVEIEAKLDKLVKLCSGMVEAGKAYVSTS...EES')
    ..................
    NM_030649_1 834 10
    MTVEFEECVKDSPRFRATIDEVETDVVEIEAKL_sequence_DKLVKLCSGMVEAGKAYVSTSRLFVSGVRDLSQQCQGDTVISECLQRFADSLQEVVNYHMILFDQAQRSVRQQLQSFVKEDVRKFKETKKQFDKVREDLELSLVRNAQAPRHRPHEVEEATGALTLTRKCFRHLALDYVLQINVLQAKKKFEILDSMLSFMHAQSSFFQQGYSLLHQLDPYMKKLAAELDQLVIDSAVEKREMERKHAAIQQRTLLQDFSYDESKVEFDVDAPSGVVMEGYLFKRASNAFKTWNRRWFSIQNSQLVYQKKLKDALTVVVDDLRLCSVKPCEDIERRFCFEVLSPTKSCMLQADSEKLRQAWVQAVQASIASAYRESPDSCYSERLDRTASPSTSSIDSATDTRERGVKGESVLQRVQSVAGNSQCGDCGQPDPRWASINLGVLLCIECSGIHRSLGVHCSKVRSLTLDSWEPELLKLMCELGNSAVNQIYEAQCEGAGSRKPTASSSRQDKEAWIKDKYVEKKFLRKAPMAPALEAPRRWRVQKCLRPHSSPRAPTARRKVRLEPVLPCVAALSSVGTLDRKFRRDSLFCPDELDSLFSYFDAGAAGAGPRSLSSDSGLGGSSDGSSDVLAFGSGSVVDSVTEEEGAESEESSGEADGDTEAEAWGLADVRELHPGLLAHRAARARDLPALAAALAHGAEVNWADAEDEGKTPLVQAVLGGSLIVCEFLLQNGADVNQRDSRGRAPLHHATLLGRTGQVCLFLKRGADQHALDQEQRDPLAIAVQAANADIVTLLRLARMAEEMREAEAAPGPPGALAGSPTELQFRRCIQEFISLHLEES
    844 10 
    
    _______________________
    record :
    ID: NM_030649
    Name: NM_030649
    Description: NM_030649
    Number of features: 0
    Seq('MTVEFEECVKDSPRFRATIDEVETDVVEIEAKLDKLVKLCSGMVEAGKAYVSTS...EES')
    ..................
    NM_030649_2 834 10
    MTVEFEECVKDSPRFRATIDEVETDVVEIEAKLDKLVKLCSGMVEAGKAYVSTSRLFVSGVRDLSQQCQ_sequence_GDTVISECLQRFADSLQEVVNYHMILFDQAQRSVRQQLQSFVKEDVRKFKETKKQFDKVREDLELSLVRNAQAPRHRPHEVEEATGALTLTRKCFRHLALDYVLQINVLQAKKKFEILDSMLSFMHAQSSFFQQGYSLLHQLDPYMKKLAAELDQLVIDSAVEKREMERKHAAIQQRTLLQDFSYDESKVEFDVDAPSGVVMEGYLFKRASNAFKTWNRRWFSIQNSQLVYQKKLKDALTVVVDDLRLCSVKPCEDIERRFCFEVLSPTKSCMLQADSEKLRQAWVQAVQASIASAYRESPDSCYSERLDRTASPSTSSIDSATDTRERGVKGESVLQRVQSVAGNSQCGDCGQPDPRWASINLGVLLCIECSGIHRSLGVHCSKVRSLTLDSWEPELLKLMCELGNSAVNQIYEAQCEGAGSRKPTASSSRQDKEAWIKDKYVEKKFLRKAPMAPALEAPRRWRVQKCLRPHSSPRAPTARRKVRLEPVLPCVAALSSVGTLDRKFRRDSLFCPDELDSLFSYFDAGAAGAGPRSLSSDSGLGGSSDGSSDVLAFGSGSVVDSVTEEEGAESEESSGEADGDTEAEAWGLADVRELHPGLLAHRAARARDLPALAAALAHGAEVNWADAEDEGKTPLVQAVLGGSLIVCEFLLQNGADVNQRDSRGRAPLHHATLLGRTGQVCLFLKRGADQHALDQEQRDPLAIAVQAANADIVTLLRLARMAEEMREAEAAPGPPGALAGSPTELQFRRCIQEFISLHLEES
    844 10 
    
    _______________________
    record :
    ID: NM_001256456
    Name: NM_001256456
    Description: NM_001256456
    Number of features: 0
    Seq('MCGAGFGHFEWLAGGGAGQDVGRSCILVSIAGKNVMLDCGMHMGFNDDRRFPDF...APS')
    ..................
    NM_001256456_1 606 10
    MCGAGFGHFEWLAGGGAGQDVGRSCILVSIAGKNVMLDCGMHMGFNDDRRFPDFSYITQNGRLTDFLDCVIISHFHLDHCGALPYFSEMVG_sequence_YDGPIYMTHPTQAICPILLEDYRKIAVDKKGEANFFTSQMIKDCMKKVVAVHLHQTVQVDDELEIKAYYAGHVLGAAMFQIKVGSESVVYTGDYNMTPDRHLGAAWIDKCRPNLLITESTYATTIRDSKRCRERDFLKKVHETVERGGKVLIPVFALGRAQELCILLETFWERMNLKVPIYFSTGLTEKANHYYKLFIPWTNQKIRKTFVQRNMFEFKHIKAFDRAFADNPGPMVVFATPGMLHAGQSLQIFRKWAGNEKNMVIMPGYCVQGTVGHKILSGQRKLEMEGRQVLEVKMQVEYMSFSAHADAKGIMQLVGQAEPESVLLVHGEAKKMEFLKQKIEQELRVNCYMPANGETVTLPTSPSIPVGISLGLLKREMAQGLLPEAKKPRLLHGTLIMKDSNFRLVSSEQALKELGLAEHQLRFTCRVHLHDTRKEQETALRVYSHLKSVLKDHCVQHLPDGSVTVESVLLQAAAPSEDPGTKVLLVSWTYQDEELGSFLTSLLKKGLPQAPS
    616 10 
    
    _______________________
    record :
    ID: NM_001256456
    Name: NM_001256456
    Description: NM_001256456
    Number of features: 0
    Seq('MCGAGFGHFEWLAGGGAGQDVGRSCILVSIAGKNVMLDCGMHMGFNDDRRFPDF...APS')
    ..................
    NM_001256456_2 606 10
    MCGAGFGHFEWLAGGGAGQDVGRSCILVSIAGKNVMLDCGMHMGFNDDRRFPDFSYITQNGRLTDFLDCVIISHFHLDHCGALPYFSEMVGYDGPIYMTHPTQAICPILLEDYRKIAVDKKGEANFFTSQMIKDCMKKVVAVHLHQTVQVDDELEIKAYYAGHVLGAAMFQIKVGSESVVYTGDYNMTPDRHLGAAWIDKCR_sequence_PNLLITESTYATTIRDSKRCRERDFLKKVHETVERGGKVLIPVFALGRAQELCILLETFWERMNLKVPIYFSTGLTEKANHYYKLFIPWTNQKIRKTFVQRNMFEFKHIKAFDRAFADNPGPMVVFATPGMLHAGQSLQIFRKWAGNEKNMVIMPGYCVQGTVGHKILSGQRKLEMEGRQVLEVKMQVEYMSFSAHADAKGIMQLVGQAEPESVLLVHGEAKKMEFLKQKIEQELRVNCYMPANGETVTLPTSPSIPVGISLGLLKREMAQGLLPEAKKPRLLHGTLIMKDSNFRLVSSEQALKELGLAEHQLRFTCRVHLHDTRKEQETALRVYSHLKSVLKDHCVQHLPDGSVTVESVLLQAAAPSEDPGTKVLLVSWTYQDEELGSFLTSLLKKGLPQAPS
    616 10 
    
    _______________________
    

    fasta file saved, as result.fa :

    >NM_030649_1
    MTVEFEECVKDSPRFRATIDEVETDVVEIEAKL_sequence_DKLVKLCSGMVEAGKAY
    VSTSRLFVSGVRDLSQQCQGDTVISECLQRFADSLQEVVNYHMILFDQAQRSVRQQLQSF
    VKEDVRKFKETKKQFDKVREDLELSLVRNAQAPRHRPHEVEEATGALTLTRKCFRHLALD
    YVLQINVLQAKKKFEILDSMLSFMHAQSSFFQQGYSLLHQLDPYMKKLAAELDQLVIDSA
    VEKREMERKHAAIQQRTLLQDFSYDESKVEFDVDAPSGVVMEGYLFKRASNAFKTWNRRW
    FSIQNSQLVYQKKLKDALTVVVDDLRLCSVKPCEDIERRFCFEVLSPTKSCMLQADSEKL
    RQAWVQAVQASIASAYRESPDSCYSERLDRTASPSTSSIDSATDTRERGVKGESVLQRVQ
    SVAGNSQCGDCGQPDPRWASINLGVLLCIECSGIHRSLGVHCSKVRSLTLDSWEPELLKL
    MCELGNSAVNQIYEAQCEGAGSRKPTASSSRQDKEAWIKDKYVEKKFLRKAPMAPALEAP
    RRWRVQKCLRPHSSPRAPTARRKVRLEPVLPCVAALSSVGTLDRKFRRDSLFCPDELDSL
    FSYFDAGAAGAGPRSLSSDSGLGGSSDGSSDVLAFGSGSVVDSVTEEEGAESEESSGEAD
    GDTEAEAWGLADVRELHPGLLAHRAARARDLPALAAALAHGAEVNWADAEDEGKTPLVQA
    VLGGSLIVCEFLLQNGADVNQRDSRGRAPLHHATLLGRTGQVCLFLKRGADQHALDQEQR
    DPLAIAVQAANADIVTLLRLARMAEEMREAEAAPGPPGALAGSPTELQFRRCIQEFISLH
    LEES
    >NM_030649_2
    MTVEFEECVKDSPRFRATIDEVETDVVEIEAKLDKLVKLCSGMVEAGKAYVSTSRLFVSG
    VRDLSQQCQ_sequence_GDTVISECLQRFADSLQEVVNYHMILFDQAQRSVRQQLQSF
    VKEDVRKFKETKKQFDKVREDLELSLVRNAQAPRHRPHEVEEATGALTLTRKCFRHLALD
    YVLQINVLQAKKKFEILDSMLSFMHAQSSFFQQGYSLLHQLDPYMKKLAAELDQLVIDSA
    VEKREMERKHAAIQQRTLLQDFSYDESKVEFDVDAPSGVVMEGYLFKRASNAFKTWNRRW
    FSIQNSQLVYQKKLKDALTVVVDDLRLCSVKPCEDIERRFCFEVLSPTKSCMLQADSEKL
    RQAWVQAVQASIASAYRESPDSCYSERLDRTASPSTSSIDSATDTRERGVKGESVLQRVQ
    SVAGNSQCGDCGQPDPRWASINLGVLLCIECSGIHRSLGVHCSKVRSLTLDSWEPELLKL
    MCELGNSAVNQIYEAQCEGAGSRKPTASSSRQDKEAWIKDKYVEKKFLRKAPMAPALEAP
    RRWRVQKCLRPHSSPRAPTARRKVRLEPVLPCVAALSSVGTLDRKFRRDSLFCPDELDSL
    FSYFDAGAAGAGPRSLSSDSGLGGSSDGSSDVLAFGSGSVVDSVTEEEGAESEESSGEAD
    GDTEAEAWGLADVRELHPGLLAHRAARARDLPALAAALAHGAEVNWADAEDEGKTPLVQA
    VLGGSLIVCEFLLQNGADVNQRDSRGRAPLHHATLLGRTGQVCLFLKRGADQHALDQEQR
    DPLAIAVQAANADIVTLLRLARMAEEMREAEAAPGPPGALAGSPTELQFRRCIQEFISLH
    LEES
    >NM_001256456_1
    MCGAGFGHFEWLAGGGAGQDVGRSCILVSIAGKNVMLDCGMHMGFNDDRRFPDFSYITQN
    GRLTDFLDCVIISHFHLDHCGALPYFSEMVG_sequence_YDGPIYMTHPTQAICPILL
    EDYRKIAVDKKGEANFFTSQMIKDCMKKVVAVHLHQTVQVDDELEIKAYYAGHVLGAAMF
    QIKVGSESVVYTGDYNMTPDRHLGAAWIDKCRPNLLITESTYATTIRDSKRCRERDFLKK
    VHETVERGGKVLIPVFALGRAQELCILLETFWERMNLKVPIYFSTGLTEKANHYYKLFIP
    WTNQKIRKTFVQRNMFEFKHIKAFDRAFADNPGPMVVFATPGMLHAGQSLQIFRKWAGNE
    KNMVIMPGYCVQGTVGHKILSGQRKLEMEGRQVLEVKMQVEYMSFSAHADAKGIMQLVGQ
    AEPESVLLVHGEAKKMEFLKQKIEQELRVNCYMPANGETVTLPTSPSIPVGISLGLLKRE
    MAQGLLPEAKKPRLLHGTLIMKDSNFRLVSSEQALKELGLAEHQLRFTCRVHLHDTRKEQ
    ETALRVYSHLKSVLKDHCVQHLPDGSVTVESVLLQAAAPSEDPGTKVLLVSWTYQDEELG
    SFLTSLLKKGLPQAPS
    >NM_001256456_2
    MCGAGFGHFEWLAGGGAGQDVGRSCILVSIAGKNVMLDCGMHMGFNDDRRFPDFSYITQN
    GRLTDFLDCVIISHFHLDHCGALPYFSEMVGYDGPIYMTHPTQAICPILLEDYRKIAVDK
    KGEANFFTSQMIKDCMKKVVAVHLHQTVQVDDELEIKAYYAGHVLGAAMFQIKVGSESVV
    YTGDYNMTPDRHLGAAWIDKCR_sequence_PNLLITESTYATTIRDSKRCRERDFLKK
    VHETVERGGKVLIPVFALGRAQELCILLETFWERMNLKVPIYFSTGLTEKANHYYKLFIP
    WTNQKIRKTFVQRNMFEFKHIKAFDRAFADNPGPMVVFATPGMLHAGQSLQIFRKWAGNE
    KNMVIMPGYCVQGTVGHKILSGQRKLEMEGRQVLEVKMQVEYMSFSAHADAKGIMQLVGQ
    AEPESVLLVHGEAKKMEFLKQKIEQELRVNCYMPANGETVTLPTSPSIPVGISLGLLKRE
    MAQGLLPEAKKPRLLHGTLIMKDSNFRLVSSEQALKELGLAEHQLRFTCRVHLHDTRKEQ
    ETALRVYSHLKSVLKDHCVQHLPDGSVTVESVLLQAAAPSEDPGTKVLLVSWTYQDEELG
    SFLTSLLKKGLPQAPS