I'd like to loop through every file in a directory given by the user and apply a specific transformation for every file that ends with ".fastq".
Basically this would be the pipeline:
This is what I have (python and biopython):
import sys, os
from Bio import SeqIO
from Bio.SeqIO.QualityIO import FastqGeneralIterator
from pathlib import Path
path = Path(sys.argv[1])
print(path)
glob_path = path.glob('*')
for file_path in glob_path:
if file_path.endswith(".fastq"):
with open(glob_path, "rU") as input_fq:
with open("{}.fasta".format(file_path),"w") as output_fa:
for (title, sequence, quality) in FastqGeneralIterator(input_fq):
output_fa.write(">%s\n%s\n" \
% (title, sequence))
if not os.path.exists(path):
raise Exception("No file at %s." % path)
The script I have is running, but it is not producing the ouput (it is not creating the fasta file as desired). How could I make it so that the script loops through the files of a specific directory and passes the global path for each file onto the for loop so that the content of input_fq is read and a given transformation is saved onto the output_fa?
Your problem is with this line:
with open(glob_path, "rU") as input_fq:
Remember that glob_path
is a list containing all of the files in the user-supplied directory. You want to open file_path
, which represents each element of the list you are iterating over:
with open(file_path, "rU") as input_fq:
Also, to be more succinct, you can eliminate your first if
statement by just globbing for the pattern "*.fastq"
:
glob_path = path.glob('*.fastq')
for file_path in glob_path:
with open(file_path, "rU") as input_fq: