I have a list of refseq IDs (keys_list) that I'm using to pull down sequence records using BioPython Entrez. I'd like to access just the sequence from fasta records returned, but I don't want to have to write the records to file to do so.
I'm trying the foloowing code
for key in key_list:
Entrez.email = "myemailaddress"
handle = Entrez.efetch(db='nuccore', id=key, rettype='fasta')
record = SeqIO.parse(handle, "fasta")
for seq_record in SeqIO.parse(record, "fasta"):
print seq_record.seq
When I run this I'm getting the error:
File "/usr/lib64/python2.6/site-packages/Bio/SeqIO/__init__.py", line 538, in parse
yield r
File "/usr/lib64/python2.6/contextlib.py", line 34, in __exit__
self.gen.throw(type, value, traceback)
File "/usr/lib64/python2.6/site-packages/Bio/File.py", line 59, in as_handle
yield handleish
File "/usr/lib64/python2.6/site-packages/Bio/SeqIO/__init__.py", line 537, in parse
for r in i:
File "/usr/lib64/python2.6/site-packages/Bio/SeqIO/FastaIO.py", line 37, in FastaIterator
line = handle.readline()
AttributeError: 'generator' object has no attribute 'readline'
If I return the entire record with handle.read()
, I can get the whole fasta record, but at this stage I'd just like to access the nucleotide sequence only.
Can anyone help me with this problem?
Many thanks in advance.
Here's what you need.
Instead of:
handle = Entrez.efetch(db='nuccore', id=key, rettype='fasta')
Try this:
handle = Entrez.efetch(db="nucleotide", id=key, retmode="xml") # retmode as 'xml' , db='nucleotide'
features = Entrez.read(handle)[0]
sequence = features['GBSeq_sequence'] # this is your sequence!
Which returns a character string that is your sequence:
'ggctcgcatctctccttcacgcgcccgccgccttacctgaggccgccatccacgccggttgagtcgcgttctgccgcctcccgcctgtggtgcctcctgaactacgtccgccgtctaggtaagtttagagctcaggtcgagaccgggcctttgtccggcgctcccttggagcctacctagactcagccggctctccacgctttgcctgaccctgcttgctcaactctacgtctttgtttcgttttctgttctgcgccgttacagatcgaaagttccacccctttccctttcattcacgactgactgccggcttggcccacggccaagtaccggcaactctgctggctcggagccagcgacagcccattctatagcactctccaggagagaaatttagtacacagttgggggctcgtccgggattcgagcgcccctttattccctaggcaatgggccaaatcttttcccgtagcgctagccctattccgcggccgccccgggggctggccgctcatcactggcttaacttcctccaggcggcatatcgcctagaacccggtccctccagttacgatttccaccagttaaaaaaatttcttaaaatagctttagaaacaccggtctggatctgccccattaactactccctcctagccagcctactcccaaaaggataccccggccgggtgaatgaaattttacacatactcatccaaacccaagcccagatcccgtcccgccccgcgccgccgccgccgtcatcctccacccacgaccccccggattctgacccacaaatcccccctccctatgttgagcctacagccccccaagtccttccagtcatgcacccacatggtgcccctcccaaccaccgcccatggcaaatgaaagacctacaggccattaagcaagaagtctcccaagcggcccctggaagcccccagtttatgcagaccatccggcttgcggtgcagcagtttgaccccactgccaaagacctccaagacctcctgcagtacctttgctcctccctcgtggcttccctccatcaccagcagctagatagccttatatcagaggccgaaactcgaggtattacaggttataaccccttagccggtcccctccgtgtccaagccaacaatccacaacaacaaggattaaggcgagaataccagcaactctggctcgccgccttcgccgccctgccagggagtgccaaagacccttcctgggcctctatcctccaaggcctggaggagccttaccacgccttcgtagaacgcctcaacatagctcttgacaatgggctgccagaaggcacgcccaaagaccccattttacgttccttagcctactctaatgcaaacaaagaatgccaaaaattactacaggcccgagggcacactaatagccctctaggagatatgttgcgggcttgtcaggcctggacccccaaagacaaaaccaaagtgttagttgtccagcctaaaaaaccccccccaaatcagccgtgcttccggtgcgggaaagcaggccactggagtcgggactgcactcagcctcgtcctccccctgggccatgccccctatgtcaagatccaactcactggaagcgagactgcccccgcctaaagcccactatcccagaaccagagccagaggaggatgccctcctattagatctccccgccgacatcccacacccaaaaaactccatagggggggaggtttaacctccccccccacattacagcaagtccttcctaaccaagacccaacatctattctgccagttataccgttagatcccgcccgtcggcccgtaattaaagcccagattgacacccagaccagccacccaaagactatcgaagctctactagatacaggagcagacatgacagtccttccgatagccttgttctcaagtaatactcccctcaaaaacacatccgtgttaggggcagggggccaaacccaagatcactttaagctcacctcccttcctgtgctaatacgcctccctttccggacgacgcctattgttttaacatcttgcctagttgataccaaaaacaactgggccatcataggtcgtgatgccttacaacaatgccaaggcgtcctgtacctccctgaggcaaaaaggccgcctgtaatcttgccaatacaggcgccagctgtccttgggctagaacacctcccaaggccccccgaaatcagccagttccctttaaaccagaacgcctccaggccttgcaacacttggtccggaaggccctggaggcaggccatatcgaaccctacaccgggccaggaaataacccagtattcccagttaaaaaagccaatggaacctggcgattcatccacgacctgcgggccactaactctctaaccatagatctctcatcatcttcccccgggccccctgacttgtccagcctgccaactacactagcccacttacaaactatagaccttaaagacgcctttttccaaatccccctacctaaacagttccagccctactttgctttcactgtcccacagcagtgtaactacggccccggcactagatacgcctggagagtactaccccaagggtttaaaaatagtcccaccctgttcgaaatgcagctggcccatatcctgcagcccattcggcaagccttcccccaatgcactattcttcagtacatggatgacattctcctggcaagcccctcccatgcggacctgcaactactctcagaggccacaatggcttccctaatctcccatgggttgcctgtgtccgaaaacaaaacccagcaaacccctggaacaattaagttcctagggcaaataatttcacctaatcacctcacttatgatgcagtccccaaggtacctatacggtcccgctgggcgctacctgaacttcaagccctacttggcgagattcagtgggtctccaaaggaactcctaccttacgccagccccttcacagtctctactgtgccttacaaaggcatactgatccccgagaccaaatatatttaaatccttctcaagttcaatcattagtgcagctgcggcaggccctgtcacagaactgccgcagtagactagtccaaaccctgcccctcctaggggctattatgctgaccctcactggcaccaccactgtggtgttccagtccaagcagcagtggccacttgtctggctacatgcccccctaccccacactagccagtgcccctgggggcagctacttgcctcagctgtgttattactcgacaaatacaccttgcaatcctatggactactctgccaaaccatacatcataacatctccacccaaaccttcaaccaattcattcaaacatctgaccaccccagtgttcctatcttactccaccacagtcaccgattcaaaaatttaggtgcccagactggagaactttggaacacttttcttaaaacaactgccccattggctcctgtgaaagcccttatgccagtgtttactctttcccctgtgatcataaacaccgccccttgcctgttttcagacggatccacctcccaggcagcctatattctctgggacaagcatatattgtcacaaagatcattcccccttccgccaccgcacaagtcggcccaacgggccgaacttctcggacttttgcatggcctctccagcgcccgttcgtggcgctgtctcaacatatttctagactccaagtatctttatcattaccttcggacccttgccctaggcaccttccaaggcaggtcctctcaggccccctttcaggccctcctgccccgcttactatcgcgtaaggtcgtctatttgcaccacgttcgcagccataccaatctacctgatcccatctccaggctcaacgctctcacagatgccctactaatcacccctgtcctgcagctctctcctgcagacctacacagtttcacccattgcggacagacggccctcacactgcaaggggcaaccacaactgaggcctccaatatcctgcgctcttgccacgcctgccgcaaaaataacccacaacatcagatgcctcaaggacacatccgccgtggcctactccctaaccacatctggcaaggcgacattacccatttcaaatataaaaatacactgtatcgccttcatgtatgggtagacaccttttcaggagccatctcagctacccaaaagagaaaagaaacaagctcagaagctatttcctctttgctccaggccattgcctatctaggcaagcctagctacataaacacagacaatggccctgcctatatttcccaagacttcctcaatatgtgtacctcccttgctattcgccatactacccatgtcccctacaatccaaccagctccggacttgtagaacgctctaatggcattcttaaaaccctattatataagtactttactgacaaacccgacctacctatggataatgctctatccatagccctatggacaatcaaccacctaaatgtattaaccaactgccacaaaacccgatggcagcttcaccactccccccgactccagccgatcccagagacacattccctcagcaataaacaaacccattggtattatttcaagcttcctggtcttaatagccgccagtggaaaggaccacaggaggctcttcaagaagctgccggcgctgctctcatcccggtaagcgctagttctgcccagtggatcccgtggaggctcctcaagcgagctgcatgcccaagacccgtcggaggccccgccgatcccaaagaaaaagaccaccaacaccatgggtaagtttctcgccactttgattttattcttccagttctgccccctcatcctcggtgattacagccccagctgctgtactctcacagttggagtctcctcataccactctaaaccctgcaatcctgcccagccagtttgttcatggaccctcgacctgctggccctttcagcagatcaggccctacagccaccctgccctaatctagtaagttactccagctaccatgccacctattccctatatctattccctcattggatcaaaaagccaaaccgaaatggcggaggctattattcagcctcttattcagacccttgttccttaaaatgcccatacctagggtgccaatcatggacctgcccctatacaggagccgtctccagcccctactggaaatttcagcaagatgtcaattttactcaagaagtttcacacctcaatattaatctccatttttcaaaatgcggtttttccttctcccttctagtcgacgctccaggatatgaccccatctggttccttaataccgaacccagccaactgcctcccaccgcccctcctctactctcccactctaacctagaccatatcctcgagccctctataccatggaaatcaaaactcctgactcttgtccagttaaccctacaaagcactaattatacttgcattgtctgtatcgatcgtgccagcctatccacttggcacgtcctatactctcccaacgtctctgttccatccccttcttctacccccctcctttacccatcgttagcgcttccagccccccacctgacgttaccatttaactggacccactgctttgacccccagattcaagctatagtctcctccccctgtcataactccctcatcctgccccccttttccttgtcacctgttcccacgctaggatcccgctcccgccgagcagtaccggtggcggtctggcttgtctccgccctggccatgggagccggagtggctggcaggattaccggctccatgtccctcgcctcaggaaagagcctcctacatgaggtggacaaagatatttcccaattaactcaagcaatagtcaaaaaccacaaaaatctgctcaaaattgcacagtatgctgcccagaacagacgaggccttgatctcctgttctgggagcaaggaggattatgcaaagcattacaagaacagtgctgttttctaaatattactaattcccatgtctcaatactacaagagagacccccccttgaaaatcgagtcctgactggctggggccttaactgggaccttggcctctcacagtgggctcgagaagccttacaaactggaatcacccttgtcgcgctactccttcttgttatccttgcaggaccatgcatcctccgtcagctacgacacctcccctcgcgcgtcagatacccccattactctcttataaaccctgagtcatccctgtaaaccaagcacacaattattgcaaccacatcgcctccagcctcccctgccaataattaacctctcccatcaaatcctccttctcctgcagcaacctcctccgttcagcctccaaggactccacctcgccttccaactgtctagtatagccatcaacccccaactcctgcattttttctttcctagcactatgctgtttcgccttctcagccccttgtctccacttgcgctcacggcgctcctgctcttcctgctttctccgggcgaagtcagcggccttctcctccgcccgcttcctgcgccgtgccttctcctcttccttccttttcaaatactcagcaatctgcttttcctcctctttctcccgctctttttttcgcttcctcttctcctcagcccgtcgctgccgatcacgatgcgtttccccgcgaggtggcgctttcccccctggagggccccgtcgcagccggccgcggctttcctcttctagagatagcaaaccgtcaagcacagtttcctcctcctccttgtcctttaactcttcctccaaggataatagcccgtccaccaattcctccaccagcaggtcctccgggcatggaacaggcaaacatcgaaacagccctacggatacaaagttaaccatgcttattatcagcccacttcccagggtttggacagagtcttcttttcggatacccagtctacgtgtttggagactgtgtacaaggcgactggtgccccatctctgggggactatgttcggcccgcctacatcgtcacgccctactggccacctgtccagagcatcagatcacctgggaccccatcgatggacgcgttatcggctcagctctacagttccttatccctcgactcccctccttccccacccagagaacctctaagacccttaaggtccttaccccgccaatcactcatacaacccccaacattccaccctccttcctccaggccatgcgcaaatactcccccttccgaaatggatacatggaacccacccttgggcagcacctcccaaccctgtcttttccagaccccggactccggccccaaaacctgtacaccctctggggaggctccgttgtctgcatgtacctctaccagctttccccccccatcacctggcccctcctgccccatgtgattttttgccaccccggccagctcggggccttcctcaccaatgttccctacaaacgaatagaaaaactcctctataaaatttcccttaccacaggggccctaataattctacccgaggactgtttgcccaccacccttttccagcctgctagggcacccgtcacgctgacagcctggcaaaacggcctccttccgttccactcaaccctcaccactccaggccttatttggacatttaccgatggcacgcctatgatttccgggccctgccctaaagatggccagccatctttagtactacagtcctcctcctttatatttcacaaatttcaaaccaaggcctaccacccctcatttctactctcacacggcctcatacagtactcttcctttcataatttgcatctcctatttgaagaatacaccaacatccccatttctctactttttaacgaaaaagaggcagatgacaatgaccatgagccccaaatatcccccgggggcttagagcctctcagtgaaaaacatttccgtgaaacagaagtctgagaaggtcagggcccagaataaggctctgacgtctccccccggaggacagctcagcaccagctcaggctaggccctgacgtgtccccctaaagacaaatcataagctcagacctccgggaagccaccgggaaccacccatttcctccccatgtttgtcaagccgtcctcaggcgttgacgacaacccctcacctcaaaaaacttttcatggcacgcatacggctcaataaaataacaggagtctataaaagcgtggggacagttcaggagggggctcgcatctctccttcacgcgcccgccgccttacctgaggccgccatccacgccggttgagtcgcgttctgccgcctcccgcctgtggtgcctcctgaactacgtccgccgtctaggtaagtttagagctcaggtcgagaccgggcctttgtccggcgctcccttggagcctacctagactcagccggctctccacgctttgcctgaccctgcttgctcaactcta'