Search code examples
pythonsequencebiopythonfasta

Get ID and protein sequences in biopython


I have this code.

from Bio import SeqIO

for seq_record in SeqIO.parse("aminoacids.txt", "fasta"):

print(seq_record.id)

print(repr(seq_record.seq))

Output:

NP_414584.1

Seq('MNTFSQVWVFSDTPSRLPELMNGAQALANQINTFVLNDADGAQAIQLGANHVWK...LAR')

NP_414563.1

Seq('MASVSISCPSCSATDGVVRNGKSTAGHQRYLCSHCRKTWQLQFTYTASQPGTHQ...RSR')

NP_414564.1

Seq('MANIKSAKKRAIQSEKARKHNASRRSMMRTFIKKVYAAIEAGDKAAAQKAFNEM...KLA')

NP_414565.1

Seq('MCRHSLRSDGAGFYQLAGCEYSFSAIKIAAGGQFLPVICAMAMKSHFFLISVLN...SLF')

NP_414566.1

Seq('MKLIRGIHNLSQAPQEGCVLTIGNFDGVHRGHRALLQGLQEEGRKRNLPVMVML...KPA')

Problem: I should get the ID and the full sequence without "Seq" at the beggining and in just one string. Something like this:

NP_414584.1
MNTFSQVWVFSDTPSRLPELMNGAQALANQINTFVLNDADGAQAIQLGANHVWKLNGKPDDRMIEDYAGVMADTIRQHGADGLVLLPNTRRGKLLAAKLGYRLKAAVSNDASTVSVQDGKATVKHMVYGGLAIGEERIATPYAVLTISSGTFDAAQPDASRTGETHTVEWQAPAVAITRTATQARQSNSVDLDKARLVVSVGRGIGSKENIALAEQLCKAIGAELACSRPVAENEKWMEHERYVGISNLMLKPELYLAVGISGQIQHMVGANASQTIFAI NKDKNAPIFQYADYGIVGDAVKILPALTAALAR

How can I get this output?


Solution

  • repr is not designed for doing final output. It's essentially a debug tool. What you have is a Seq object. You probably need to be doing:

    print(seq_record.seq)
    

    which uses the str method.