I am processing a large BLAST file together with a large FASTA file, and I need to load several lines of FASTA for one block of BLAST (let's say it is one line).
I would expect that in the second loop (line) of BLAST, it will continue on the next line from the last-processed FASTA line, but it is loading all the same FASTA lines. Why? And how can I load the next line? Is it really necessary to add some indexing?
with open(fastaname,'r') as fastafile:
with open(blastfilename,'r') as blastfile:
for line in blastfile:
while True:
fastaline = fastafile.readline()[:-1]
if fastaline[0]=='>':
break
fastaseq += fastaline
somefunction(line,fastaseq)
FASTA has the typical format:
>name_of_seqence\n
ACGATCATCGTAGCTGCATGACTGCA\n
GATCGATCTGATCGATGCAGTCAGTA\n
>name_of_seqence\n
GCACGCGACCACGATCATTGACTGCA\n
CAAAAGATCTGATCGATGCAGTCAGT\n
CAGTCGATGCTAGTCGATGCTCGATA\n
etc.
I need each sequence for every line of the next BLAST sequence.
I don't see anything that would ever clear the lines you're appended to fastaseq
after finishing a line from your blast file. Try this:
with open(fastaname,'r') as fastafile:
with open(blastfilename,'r') as blastfile:
for line in blastfile:
fastaseq = '' # or whatever data type this is
while True:
fastaline = fastafile.readline()[:-1]
if fastaline[0]=='>':
break
fastaseq += fastaline
somefunction(line,fastaseq)
As it is your first blast line would get the first set of fasta lines, the second blast line would get the first and second sets, etc etc.