I'm having some difficulty downloading fasta sequences for multiple accession numbers in a text file using a python script. I can do this OK for a single accession number e.g:
import sys
from Bio import Entrez
Entrez.email = "[email protected]"
handle = Entrez.efetch(db="protein", id="EAS03220", rettype="fasta")
print(handle.read())
But when I try to give it a file as a list (see below) then I get errors.
import sys
from Bio import Entrez
Entrez.email = "[email protected]"
accessions = []
for line in open(sys.argv[1],"r"):
line = line.strip()
accessions.append(line)
for num in accessions:
handle = Entrez.efetch(db="protein", id="num", rettype="fasta")
print(handle.read())
Here's and example of how my input file looks:
EAS06781
EAS07087
EAS07113
EAS07200
EAS07226
EAS07230
I'm sure the solution is easy but I've been reading forums, ncbi help-pages and python for beginners books for hours and getting nowhere! Thanks in advance.
You are passing num
as a string
, not as a variable.
Try removing the quotation marks and it should work.
handle = Entrez.efetch(db="protein", id=num, rettype="fasta")