Search code examples
pythonbiopythonfasta

Biopython parse from variable instead of file


import gzip
import io
from Bio import SeqIO

infile = "myinfile.fastq.gz"
fileout = open("myoutfile.fastq", "w+")
with io.TextIOWrapper(gzip.open(infile, "r")) as f:
    line = f.read()
fileout.write(line)
fileout.seek(0)

count = 0
for rec in SeqIO.parse(fileout, "fastq"): #parsing from file
    count += 1
print("%i reads" % count)

The above works when "line" is written to a file and that file is feed to the parser, but below does not work. Why can't line be read directly? Is there a way to feed "line" straight to the parser without having to write to a file first?

infile = "myinfile.fastq.gz"
#fileout = "myoutfile.fastq"
with io.TextIOWrapper(gzip.open(infile, "r")) as f:
    line = f.read()
#myout.write(line)

count = 0
for rec in SeqIO.parse(line, "fastq"): #line used instead of writing from file
    count += 1
print("%i reads" % count)

Solution

  • It's because SeqIO.parse only accepts a file handler or a filename as the first parameter.

    If you want to read a gzipped file directly into SeqIO.parse just pass a handler to it:

    import gzip
    from Bio import SeqIO
    
    count = 0
    with gzip.open("myinfile.fastq.gz") as f:
        for rec in SeqIO.parse(f, "fastq"):
            count += 1
    
    print("{} reads".format(count))