I would like to insert a specific sequence at a defined position in a FASTA formatted file for multiple sequences, where the modified sequences would be output in a single file.
I have tried the following commands:
I can print the records using the code below, but I cannot insert seq at the position. I can only export fixed record data.
from Bio import SeqIO
import pandas as pd
output_handle2 = open("new_fasta2.fasta", "a")
records1 = SeqIO.index("file_test.fa", "fasta")
candidate_df=pd.read_csv("file_test.csv")
for i in candidate_df['refseq']:
if i in records1:
print(">" + records1[i].id + "_" + "\n" + records1[i].seq)
SeqIO.write(records1[i], output_handle2, 'fasta')
The code below prints the record and inserted sequence for only one position (column 3).
temp = {}
for line in open("file_test.csv","r"):
i, c, d = line.strip().split(',')
temp[i] = c
temp[i] = d
for rec in SeqIO.parse("file_test.fa", "fasta"):
if str(rec.id) in temp.keys():
print(">" + str(rec.id) + "_" + temp[rec.id])
a = temp[rec.id]
b = int(len(rec)) - int(a)
print(str(rec.seq[:len(rec) - int(a)] + "_sequence_" + rec.seq[b:]))
FASTA formatted file
>NM_030649
MTVEFEECVKDSPRFRATIDEVETDVVEIEAKLDKLVKLCSGMVEAGKAYVSTSRLFVSGVRDLSQQCQGDTVISECLQRFADSLQEVVNYHMILFDQAQRSVRQQLQSFVKEDVRKFKETKKQFDKVREDLELSLVRNAQAPRHRPHEVEEATGALTLTRKCFRHLALDYVLQINVLQAKKKFEILDSMLSFMHAQSSFFQQGYSLLHQLDPYMKKLAAELDQLVIDSAVEKREMERKHAAIQQRTLLQDFSYDESKVEFDVDAPSGVVMEGYLFKRASNAFKTWNRRWFSIQNSQLVYQKKLKDALTVVVDDLRLCSVKPCEDIERRFCFEVLSPTKSCMLQADSEKLRQAWVQAVQASIASAYRESPDSCYSERLDRTASPSTSSIDSATDTRERGVKGESVLQRVQSVAGNSQCGDCGQPDPRWASINLGVLLCIECSGIHRSLGVHCSKVRSLTLDSWEPELLKLMCELGNSAVNQIYEAQCEGAGSRKPTASSSRQDKEAWIKDKYVEKKFLRKAPMAPALEAPRRWRVQKCLRPHSSPRAPTARRKVRLEPVLPCVAALSSVGTLDRKFRRDSLFCPDELDSLFSYFDAGAAGAGPRSLSSDSGLGGSSDGSSDVLAFGSGSVVDSVTEEEGAESEESSGEADGDTEAEAWGLADVRELHPGLLAHRAARARDLPALAAALAHGAEVNWADAEDEGKTPLVQAVLGGSLIVCEFLLQNGADVNQRDSRGRAPLHHATLLGRTGQVCLFLKRGADQHALDQEQRDPLAIAVQAANADIVTLLRLARMAEEMREAEAAPGPPGALAGSPTELQFRRCIQEFISLHLEES
>NM_001256456
MCGAGFGHFEWLAGGGAGQDVGRSCILVSIAGKNVMLDCGMHMGFNDDRRFPDFSYITQNGRLTDFLDCVIISHFHLDHCGALPYFSEMVGYDGPIYMTHPTQAICPILLEDYRKIAVDKKGEANFFTSQMIKDCMKKVVAVHLHQTVQVDDELEIKAYYAGHVLGAAMFQIKVGSESVVYTGDYNMTPDRHLGAAWIDKCRPNLLITESTYATTIRDSKRCRERDFLKKVHETVERGGKVLIPVFALGRAQELCILLETFWERMNLKVPIYFSTGLTEKANHYYKLFIPWTNQKIRKTFVQRNMFEFKHIKAFDRAFADNPGPMVVFATPGMLHAGQSLQIFRKWAGNEKNMVIMPGYCVQGTVGHKILSGQRKLEMEGRQVLEVKMQVEYMSFSAHADAKGIMQLVGQAEPESVLLVHGEAKKMEFLKQKIEQELRVNCYMPANGETVTLPTSPSIPVGISLGLLKREMAQGLLPEAKKPRLLHGTLIMKDSNFRLVSSEQALKELGLAEHQLRFTCRVHLHDTRKEQETALRVYSHLKSVLKDHCVQHLPDGSVTVESVLLQAAAPSEDPGTKVLLVSWTYQDEELGSFLTSLLKKGLPQAPS
Positions where to insert sequence
column 1: label
column 2: 2nd part of label
column 3: position
NM_030649 1 33
NM_030649 2 69
NM_001256456 1 91
NM_001256456 2 202
custom sequence to insert - I have indicated a lower case sequence here to easily visualize in example below, but final sequence will be upper case.
sequence
Example output
>NM_030649_1
MTVEFEECVKDSPRFRATIDEVETDVVEIEAKLsequenceDKLVKLCSGMVEAGKAYVSTSRLFVSGVRDLSQQCQGDTVISECLQRFADSLQEVVNYHMILFDQAQRSVRQQLQSFVKEDVRKFKETKKQFDKVREDLELSLVRNAQAPRHRPHEVEEATGALTLTRKCFRHLALDYVLQINVLQAKKKFEILDSMLSFMHAQSSFFQQGYSLLHQLDPYMKKLAAELDQLVIDSAVEKREMERKHAAIQQRTLLQDFSYDESKVEFDVDAPSGVVMEGYLFKRASNAFKTWNRRWFSIQNSQLVYQKKLKDALTVVVDDLRLCSVKPCEDIERRFCFEVLSPTKSCMLQADSEKLRQAWVQAVQASIASAYRESPDSCYSERLDRTASPSTSSIDSATDTRERGVKGESVLQRVQSVAGNSQCGDCGQPDPRWASINLGVLLCIECSGIHRSLGVHCSKVRSLTLDSWEPELLKLMCELGNSAVNQIYEAQCEGAGSRKPTASSSRQDKEAWIKDKYVEKKFLRKAPMAPALEAPRRWRVQKCLRPHSSPRAPTARRKVRLEPVLPCVAALSSVGTLDRKFRRDSLFCPDELDSLFSYFDAGAAGAGPRSLSSDSGLGGSSDGSSDVLAFGSGSVVDSVTEEEGAESEESSGEADGDTEAEAWGLADVRELHPGLLAHRAARARDLPALAAALAHGAEVNWADAEDEGKTPLVQAVLGGSLIVCEFLLQNGADVNQRDSRGRAPLHHATLLGRTGQVCLFLKRGADQHALDQEQRDPLAIAVQAANADIVTLLRLARMAEEMREAEAAPGPPGALAGSPTELQFRRCIQEFISLHLEES
>NM_030649_2
MTVEFEECVKDSPRFRATIDEVETDVVEIEAKLDKLVKLCSGMVEAGKAYVSTSRLFVSGVRDLSQQCQsequenceGDTVISECLQRFADSLQEVVNYHMILFDQAQRSVRQQLQSFVKEDVRKFKETKKQFDKVREDLELSLVRNAQAPRHRPHEVEEATGALTLTRKCFRHLALDYVLQINVLQAKKKFEILDSMLSFMHAQSSFFQQGYSLLHQLDPYMKKLAAELDQLVIDSAVEKREMERKHAAIQQRTLLQDFSYDESKVEFDVDAPSGVVMEGYLFKRASNAFKTWNRRWFSIQNSQLVYQKKLKDALTVVVDDLRLCSVKPCEDIERRFCFEVLSPTKSCMLQADSEKLRQAWVQAVQASIASAYRESPDSCYSERLDRTASPSTSSIDSATDTRERGVKGESVLQRVQSVAGNSQCGDCGQPDPRWASINLGVLLCIECSGIHRSLGVHCSKVRSLTLDSWEPELLKLMCELGNSAVNQIYEAQCEGAGSRKPTASSSRQDKEAWIKDKYVEKKFLRKAPMAPALEAPRRWRVQKCLRPHSSPRAPTARRKVRLEPVLPCVAALSSVGTLDRKFRRDSLFCPDELDSLFSYFDAGAAGAGPRSLSSDSGLGGSSDGSSDVLAFGSGSVVDSVTEEEGAESEESSGEADGDTEAEAWGLADVRELHPGLLAHRAARARDLPALAAALAHGAEVNWADAEDEGKTPLVQAVLGGSLIVCEFLLQNGADVNQRDSRGRAPLHHATLLGRTGQVCLFLKRGADQHALDQEQRDPLAIAVQAANADIVTLLRLARMAEEMREAEAAPGPPGALAGSPTELQFRRCIQEFISLHLEES
>NM_001256456_1
MCGAGFGHFEWLAGGGAGQDVGRSCILVSIAGKNVMLDCGMHMGFNDDRRFPDFSYITQNGRLTDFLDCVIISHFHLDHCGALPYFSEMVGsequenceYDGPIYMTHPTQAICPILLEDYRKIAVDKKGEANFFTSQMIKDCMKKVVAVHLHQTVQVDDELEIKAYYAGHVLGAAMFQIKVGSESVVYTGDYNMTPDRHLGAAWIDKCRPNLLITESTYATTIRDSKRCRERDFLKKVHETVERGGKVLIPVFALGRAQELCILLETFWERMNLKVPIYFSTGLTEKANHYYKLFIPWTNQKIRKTFVQRNMFEFKHIKAFDRAFADNPGPMVVFATPGMLHAGQSLQIFRKWAGNEKNMVIMPGYCVQGTVGHKILSGQRKLEMEGRQVLEVKMQVEYMSFSAHADAKGIMQLVGQAEPESVLLVHGEAKKMEFLKQKIEQELRVNCYMPANGETVTLPTSPSIPVGISLGLLKREMAQGLLPEAKKPRLLHGTLIMKDSNFRLVSSEQALKELGLAEHQLRFTCRVHLHDTRKEQETALRVYSHLKSVLKDHCVQHLPDGSVTVESVLLQAAAPSEDPGTKVLLVSWTYQDEELGSFLTSLLKKGLPQAPS
>NM_001256456_2
MCGAGFGHFEWLAGGGAGQDVGRSCILVSIAGKNVMLDCGMHMGFNDDRRFPDFSYITQNGRLTDFLDCVIISHFHLDHCGALPYFSEMVGYDGPIYMTHPTQAICPILLEDYRKIAVDKKGEANFFTSQMIKDCMKKVVAVHLHQTVQVDDELEIKAYYAGHVLGAAMFQIKVGSESVVYTGDYNMTPDRHLGAAWIDKCRsequencePNLLITESTYATTIRDSKRCRERDFLKKVHETVERGGKVLIPVFALGRAQELCILLETFWERMNLKVPIYFSTGLTEKANHYYKLFIPWTNQKIRKTFVQRNMFEFKHIKAFDRAFADNPGPMVVFATPGMLHAGQSLQIFRKWAGNEKNMVIMPGYCVQGTVGHKILSGQRKLEMEGRQVLEVKMQVEYMSFSAHADAKGIMQLVGQAEPESVLLVHGEAKKMEFLKQKIEQELRVNCYMPANGETVTLPTSPSIPVGISLGLLKREMAQGLLPEAKKPRLLHGTLIMKDSNFRLVSSEQALKELGLAEHQLRFTCRVHLHDTRKEQETALRVYSHLKSVLKDHCVQHLPDGSVTVESVLLQAAAPSEDPGTKVLLVSWTYQDEELGSFLTSLLKKGLPQAPS
second attempt:
csv file, file_test.csv
:
NM_030649, 1, 33
NM_030649, 2, 69
NM_001256456, 1, 91
NM_001256456, 2, 202
fasta file, file_test.fa
:
>NM_030649
MTVEFEECVKDSPRFRATIDEVETDVVEIEAKLDKLVKLCSGMVEAGKAYVSTSRLFVSGVRDLSQQCQGDTVISECLQRFADSLQEVVNYHMILFDQAQRSVRQQLQSFVKEDVRKFKETKKQFDKVREDLELSLVRNAQAPRHRPHEVEEATGALTLTRKCFRHLALDYVLQINVLQAKKKFEILDSMLSFMHAQSSFFQQGYSLLHQLDPYMKKLAAELDQLVIDSAVEKREMERKHAAIQQRTLLQDFSYDESKVEFDVDAPSGVVMEGYLFKRASNAFKTWNRRWFSIQNSQLVYQKKLKDALTVVVDDLRLCSVKPCEDIERRFCFEVLSPTKSCMLQADSEKLRQAWVQAVQASIASAYRESPDSCYSERLDRTASPSTSSIDSATDTRERGVKGESVLQRVQSVAGNSQCGDCGQPDPRWASINLGVLLCIECSGIHRSLGVHCSKVRSLTLDSWEPELLKLMCELGNSAVNQIYEAQCEGAGSRKPTASSSRQDKEAWIKDKYVEKKFLRKAPMAPALEAPRRWRVQKCLRPHSSPRAPTARRKVRLEPVLPCVAALSSVGTLDRKFRRDSLFCPDELDSLFSYFDAGAAGAGPRSLSSDSGLGGSSDGSSDVLAFGSGSVVDSVTEEEGAESEESSGEADGDTEAEAWGLADVRELHPGLLAHRAARARDLPALAAALAHGAEVNWADAEDEGKTPLVQAVLGGSLIVCEFLLQNGADVNQRDSRGRAPLHHATLLGRTGQVCLFLKRGADQHALDQEQRDPLAIAVQAANADIVTLLRLARMAEEMREAEAAPGPPGALAGSPTELQFRRCIQEFISLHLEES
>NM_001256456
MCGAGFGHFEWLAGGGAGQDVGRSCILVSIAGKNVMLDCGMHMGFNDDRRFPDFSYITQNGRLTDFLDCVIISHFHLDHCGALPYFSEMVGYDGPIYMTHPTQAICPILLEDYRKIAVDKKGEANFFTSQMIKDCMKKVVAVHLHQTVQVDDELEIKAYYAGHVLGAAMFQIKVGSESVVYTGDYNMTPDRHLGAAWIDKCRPNLLITESTYATTIRDSKRCRERDFLKKVHETVERGGKVLIPVFALGRAQELCILLETFWERMNLKVPIYFSTGLTEKANHYYKLFIPWTNQKIRKTFVQRNMFEFKHIKAFDRAFADNPGPMVVFATPGMLHAGQSLQIFRKWAGNEKNMVIMPGYCVQGTVGHKILSGQRKLEMEGRQVLEVKMQVEYMSFSAHADAKGIMQLVGQAEPESVLLVHGEAKKMEFLKQKIEQELRVNCYMPANGETVTLPTSPSIPVGISLGLLKREMAQGLLPEAKKPRLLHGTLIMKDSNFRLVSSEQALKELGLAEHQLRFTCRVHLHDTRKEQETALRVYSHLKSVLKDHCVQHLPDGSVTVESVLLQAAAPSEDPGTKVLLVSWTYQDEELGSFLTSLLKKGLPQAPS
my code:
from Bio import SeqIO
from Bio.Seq import Seq
# import pandas as pd
sequence = "_sequence_"
temp = {}
for line in open("file_test.csv","r"):
# print(line)
i, c, d = line.strip().replace(" ", "").split(',')
# print(i,c,d, '\n\n')
temp[i+'_'+c] = d
# temp[i] = d # ------> removed
print(temp,'\n')
with open("result.fa", "w") as handle:
for label in temp.keys():
# print(label,' ',label.rsplit('_', 1)[0])
if label.rsplit('_', 1)[0] in [rec.id for rec in SeqIO.parse("file_test.fa", "fasta")]:
for rec in SeqIO.parse("file_test.fa", "fasta"):
if rec.id == label.rsplit('_', 1)[0]:
print('record :')
print(rec)
print('..................')
print(label, len(rec.seq), len(sequence))
print(rec.seq[:int(temp[label])] + sequence
+ rec.seq[int(temp[label]):])
rec.id = label
rec.name = ''
rec.description = ''
rec.seq = Seq(str(rec.seq[:int(temp[label])] + sequence
+ rec.seq[int(temp[label]):]))
print(len(rec.seq), len(sequence),'\n')
SeqIO.write(rec, handle, "fasta")
print('_______________________')
output, printed not saved:
{'NM_030649_1': '33', 'NM_030649_2': '69', 'NM_001256456_1': '91', 'NM_001256456_2': '202'}
record :
ID: NM_030649
Name: NM_030649
Description: NM_030649
Number of features: 0
Seq('MTVEFEECVKDSPRFRATIDEVETDVVEIEAKLDKLVKLCSGMVEAGKAYVSTS...EES')
..................
NM_030649_1 834 10
MTVEFEECVKDSPRFRATIDEVETDVVEIEAKL_sequence_DKLVKLCSGMVEAGKAYVSTSRLFVSGVRDLSQQCQGDTVISECLQRFADSLQEVVNYHMILFDQAQRSVRQQLQSFVKEDVRKFKETKKQFDKVREDLELSLVRNAQAPRHRPHEVEEATGALTLTRKCFRHLALDYVLQINVLQAKKKFEILDSMLSFMHAQSSFFQQGYSLLHQLDPYMKKLAAELDQLVIDSAVEKREMERKHAAIQQRTLLQDFSYDESKVEFDVDAPSGVVMEGYLFKRASNAFKTWNRRWFSIQNSQLVYQKKLKDALTVVVDDLRLCSVKPCEDIERRFCFEVLSPTKSCMLQADSEKLRQAWVQAVQASIASAYRESPDSCYSERLDRTASPSTSSIDSATDTRERGVKGESVLQRVQSVAGNSQCGDCGQPDPRWASINLGVLLCIECSGIHRSLGVHCSKVRSLTLDSWEPELLKLMCELGNSAVNQIYEAQCEGAGSRKPTASSSRQDKEAWIKDKYVEKKFLRKAPMAPALEAPRRWRVQKCLRPHSSPRAPTARRKVRLEPVLPCVAALSSVGTLDRKFRRDSLFCPDELDSLFSYFDAGAAGAGPRSLSSDSGLGGSSDGSSDVLAFGSGSVVDSVTEEEGAESEESSGEADGDTEAEAWGLADVRELHPGLLAHRAARARDLPALAAALAHGAEVNWADAEDEGKTPLVQAVLGGSLIVCEFLLQNGADVNQRDSRGRAPLHHATLLGRTGQVCLFLKRGADQHALDQEQRDPLAIAVQAANADIVTLLRLARMAEEMREAEAAPGPPGALAGSPTELQFRRCIQEFISLHLEES
844 10
_______________________
record :
ID: NM_030649
Name: NM_030649
Description: NM_030649
Number of features: 0
Seq('MTVEFEECVKDSPRFRATIDEVETDVVEIEAKLDKLVKLCSGMVEAGKAYVSTS...EES')
..................
NM_030649_2 834 10
MTVEFEECVKDSPRFRATIDEVETDVVEIEAKLDKLVKLCSGMVEAGKAYVSTSRLFVSGVRDLSQQCQ_sequence_GDTVISECLQRFADSLQEVVNYHMILFDQAQRSVRQQLQSFVKEDVRKFKETKKQFDKVREDLELSLVRNAQAPRHRPHEVEEATGALTLTRKCFRHLALDYVLQINVLQAKKKFEILDSMLSFMHAQSSFFQQGYSLLHQLDPYMKKLAAELDQLVIDSAVEKREMERKHAAIQQRTLLQDFSYDESKVEFDVDAPSGVVMEGYLFKRASNAFKTWNRRWFSIQNSQLVYQKKLKDALTVVVDDLRLCSVKPCEDIERRFCFEVLSPTKSCMLQADSEKLRQAWVQAVQASIASAYRESPDSCYSERLDRTASPSTSSIDSATDTRERGVKGESVLQRVQSVAGNSQCGDCGQPDPRWASINLGVLLCIECSGIHRSLGVHCSKVRSLTLDSWEPELLKLMCELGNSAVNQIYEAQCEGAGSRKPTASSSRQDKEAWIKDKYVEKKFLRKAPMAPALEAPRRWRVQKCLRPHSSPRAPTARRKVRLEPVLPCVAALSSVGTLDRKFRRDSLFCPDELDSLFSYFDAGAAGAGPRSLSSDSGLGGSSDGSSDVLAFGSGSVVDSVTEEEGAESEESSGEADGDTEAEAWGLADVRELHPGLLAHRAARARDLPALAAALAHGAEVNWADAEDEGKTPLVQAVLGGSLIVCEFLLQNGADVNQRDSRGRAPLHHATLLGRTGQVCLFLKRGADQHALDQEQRDPLAIAVQAANADIVTLLRLARMAEEMREAEAAPGPPGALAGSPTELQFRRCIQEFISLHLEES
844 10
_______________________
record :
ID: NM_001256456
Name: NM_001256456
Description: NM_001256456
Number of features: 0
Seq('MCGAGFGHFEWLAGGGAGQDVGRSCILVSIAGKNVMLDCGMHMGFNDDRRFPDF...APS')
..................
NM_001256456_1 606 10
MCGAGFGHFEWLAGGGAGQDVGRSCILVSIAGKNVMLDCGMHMGFNDDRRFPDFSYITQNGRLTDFLDCVIISHFHLDHCGALPYFSEMVG_sequence_YDGPIYMTHPTQAICPILLEDYRKIAVDKKGEANFFTSQMIKDCMKKVVAVHLHQTVQVDDELEIKAYYAGHVLGAAMFQIKVGSESVVYTGDYNMTPDRHLGAAWIDKCRPNLLITESTYATTIRDSKRCRERDFLKKVHETVERGGKVLIPVFALGRAQELCILLETFWERMNLKVPIYFSTGLTEKANHYYKLFIPWTNQKIRKTFVQRNMFEFKHIKAFDRAFADNPGPMVVFATPGMLHAGQSLQIFRKWAGNEKNMVIMPGYCVQGTVGHKILSGQRKLEMEGRQVLEVKMQVEYMSFSAHADAKGIMQLVGQAEPESVLLVHGEAKKMEFLKQKIEQELRVNCYMPANGETVTLPTSPSIPVGISLGLLKREMAQGLLPEAKKPRLLHGTLIMKDSNFRLVSSEQALKELGLAEHQLRFTCRVHLHDTRKEQETALRVYSHLKSVLKDHCVQHLPDGSVTVESVLLQAAAPSEDPGTKVLLVSWTYQDEELGSFLTSLLKKGLPQAPS
616 10
_______________________
record :
ID: NM_001256456
Name: NM_001256456
Description: NM_001256456
Number of features: 0
Seq('MCGAGFGHFEWLAGGGAGQDVGRSCILVSIAGKNVMLDCGMHMGFNDDRRFPDF...APS')
..................
NM_001256456_2 606 10
MCGAGFGHFEWLAGGGAGQDVGRSCILVSIAGKNVMLDCGMHMGFNDDRRFPDFSYITQNGRLTDFLDCVIISHFHLDHCGALPYFSEMVGYDGPIYMTHPTQAICPILLEDYRKIAVDKKGEANFFTSQMIKDCMKKVVAVHLHQTVQVDDELEIKAYYAGHVLGAAMFQIKVGSESVVYTGDYNMTPDRHLGAAWIDKCR_sequence_PNLLITESTYATTIRDSKRCRERDFLKKVHETVERGGKVLIPVFALGRAQELCILLETFWERMNLKVPIYFSTGLTEKANHYYKLFIPWTNQKIRKTFVQRNMFEFKHIKAFDRAFADNPGPMVVFATPGMLHAGQSLQIFRKWAGNEKNMVIMPGYCVQGTVGHKILSGQRKLEMEGRQVLEVKMQVEYMSFSAHADAKGIMQLVGQAEPESVLLVHGEAKKMEFLKQKIEQELRVNCYMPANGETVTLPTSPSIPVGISLGLLKREMAQGLLPEAKKPRLLHGTLIMKDSNFRLVSSEQALKELGLAEHQLRFTCRVHLHDTRKEQETALRVYSHLKSVLKDHCVQHLPDGSVTVESVLLQAAAPSEDPGTKVLLVSWTYQDEELGSFLTSLLKKGLPQAPS
616 10
_______________________
fasta file saved, as result.fa
:
>NM_030649_1
MTVEFEECVKDSPRFRATIDEVETDVVEIEAKL_sequence_DKLVKLCSGMVEAGKAY
VSTSRLFVSGVRDLSQQCQGDTVISECLQRFADSLQEVVNYHMILFDQAQRSVRQQLQSF
VKEDVRKFKETKKQFDKVREDLELSLVRNAQAPRHRPHEVEEATGALTLTRKCFRHLALD
YVLQINVLQAKKKFEILDSMLSFMHAQSSFFQQGYSLLHQLDPYMKKLAAELDQLVIDSA
VEKREMERKHAAIQQRTLLQDFSYDESKVEFDVDAPSGVVMEGYLFKRASNAFKTWNRRW
FSIQNSQLVYQKKLKDALTVVVDDLRLCSVKPCEDIERRFCFEVLSPTKSCMLQADSEKL
RQAWVQAVQASIASAYRESPDSCYSERLDRTASPSTSSIDSATDTRERGVKGESVLQRVQ
SVAGNSQCGDCGQPDPRWASINLGVLLCIECSGIHRSLGVHCSKVRSLTLDSWEPELLKL
MCELGNSAVNQIYEAQCEGAGSRKPTASSSRQDKEAWIKDKYVEKKFLRKAPMAPALEAP
RRWRVQKCLRPHSSPRAPTARRKVRLEPVLPCVAALSSVGTLDRKFRRDSLFCPDELDSL
FSYFDAGAAGAGPRSLSSDSGLGGSSDGSSDVLAFGSGSVVDSVTEEEGAESEESSGEAD
GDTEAEAWGLADVRELHPGLLAHRAARARDLPALAAALAHGAEVNWADAEDEGKTPLVQA
VLGGSLIVCEFLLQNGADVNQRDSRGRAPLHHATLLGRTGQVCLFLKRGADQHALDQEQR
DPLAIAVQAANADIVTLLRLARMAEEMREAEAAPGPPGALAGSPTELQFRRCIQEFISLH
LEES
>NM_030649_2
MTVEFEECVKDSPRFRATIDEVETDVVEIEAKLDKLVKLCSGMVEAGKAYVSTSRLFVSG
VRDLSQQCQ_sequence_GDTVISECLQRFADSLQEVVNYHMILFDQAQRSVRQQLQSF
VKEDVRKFKETKKQFDKVREDLELSLVRNAQAPRHRPHEVEEATGALTLTRKCFRHLALD
YVLQINVLQAKKKFEILDSMLSFMHAQSSFFQQGYSLLHQLDPYMKKLAAELDQLVIDSA
VEKREMERKHAAIQQRTLLQDFSYDESKVEFDVDAPSGVVMEGYLFKRASNAFKTWNRRW
FSIQNSQLVYQKKLKDALTVVVDDLRLCSVKPCEDIERRFCFEVLSPTKSCMLQADSEKL
RQAWVQAVQASIASAYRESPDSCYSERLDRTASPSTSSIDSATDTRERGVKGESVLQRVQ
SVAGNSQCGDCGQPDPRWASINLGVLLCIECSGIHRSLGVHCSKVRSLTLDSWEPELLKL
MCELGNSAVNQIYEAQCEGAGSRKPTASSSRQDKEAWIKDKYVEKKFLRKAPMAPALEAP
RRWRVQKCLRPHSSPRAPTARRKVRLEPVLPCVAALSSVGTLDRKFRRDSLFCPDELDSL
FSYFDAGAAGAGPRSLSSDSGLGGSSDGSSDVLAFGSGSVVDSVTEEEGAESEESSGEAD
GDTEAEAWGLADVRELHPGLLAHRAARARDLPALAAALAHGAEVNWADAEDEGKTPLVQA
VLGGSLIVCEFLLQNGADVNQRDSRGRAPLHHATLLGRTGQVCLFLKRGADQHALDQEQR
DPLAIAVQAANADIVTLLRLARMAEEMREAEAAPGPPGALAGSPTELQFRRCIQEFISLH
LEES
>NM_001256456_1
MCGAGFGHFEWLAGGGAGQDVGRSCILVSIAGKNVMLDCGMHMGFNDDRRFPDFSYITQN
GRLTDFLDCVIISHFHLDHCGALPYFSEMVG_sequence_YDGPIYMTHPTQAICPILL
EDYRKIAVDKKGEANFFTSQMIKDCMKKVVAVHLHQTVQVDDELEIKAYYAGHVLGAAMF
QIKVGSESVVYTGDYNMTPDRHLGAAWIDKCRPNLLITESTYATTIRDSKRCRERDFLKK
VHETVERGGKVLIPVFALGRAQELCILLETFWERMNLKVPIYFSTGLTEKANHYYKLFIP
WTNQKIRKTFVQRNMFEFKHIKAFDRAFADNPGPMVVFATPGMLHAGQSLQIFRKWAGNE
KNMVIMPGYCVQGTVGHKILSGQRKLEMEGRQVLEVKMQVEYMSFSAHADAKGIMQLVGQ
AEPESVLLVHGEAKKMEFLKQKIEQELRVNCYMPANGETVTLPTSPSIPVGISLGLLKRE
MAQGLLPEAKKPRLLHGTLIMKDSNFRLVSSEQALKELGLAEHQLRFTCRVHLHDTRKEQ
ETALRVYSHLKSVLKDHCVQHLPDGSVTVESVLLQAAAPSEDPGTKVLLVSWTYQDEELG
SFLTSLLKKGLPQAPS
>NM_001256456_2
MCGAGFGHFEWLAGGGAGQDVGRSCILVSIAGKNVMLDCGMHMGFNDDRRFPDFSYITQN
GRLTDFLDCVIISHFHLDHCGALPYFSEMVGYDGPIYMTHPTQAICPILLEDYRKIAVDK
KGEANFFTSQMIKDCMKKVVAVHLHQTVQVDDELEIKAYYAGHVLGAAMFQIKVGSESVV
YTGDYNMTPDRHLGAAWIDKCR_sequence_PNLLITESTYATTIRDSKRCRERDFLKK
VHETVERGGKVLIPVFALGRAQELCILLETFWERMNLKVPIYFSTGLTEKANHYYKLFIP
WTNQKIRKTFVQRNMFEFKHIKAFDRAFADNPGPMVVFATPGMLHAGQSLQIFRKWAGNE
KNMVIMPGYCVQGTVGHKILSGQRKLEMEGRQVLEVKMQVEYMSFSAHADAKGIMQLVGQ
AEPESVLLVHGEAKKMEFLKQKIEQELRVNCYMPANGETVTLPTSPSIPVGISLGLLKRE
MAQGLLPEAKKPRLLHGTLIMKDSNFRLVSSEQALKELGLAEHQLRFTCRVHLHDTRKEQ
ETALRVYSHLKSVLKDHCVQHLPDGSVTVESVLLQAAAPSEDPGTKVLLVSWTYQDEELG
SFLTSLLKKGLPQAPS