I looked and found solutions, tried them and got the same result. I tried using Popen.wait()
, run()
and call()
. As suggested by other users, I also tried passing the command as a list of strings. Didn't work. The subprocess call doesn't give an error, so that's not the issue.
Here's the function:
def blast(file):
command = f'blastn -query {output_path}fasta_files/{file} -db {db_path} -max_hsps 1 -max_target_seqs 40 -num_threads 4 -evalue 1e-5 ' \
f'-out {output_path}blast/{file[:-2]}txt -outfmt "6 qseqid sseqid pident staxids sskingdoms qstart qend ' \
f'qlen length sstart send slen evalue mismatch gapopen bitscore stitle"'
subprocess.Popen(command, stdout=subprocess.PIPE, shell=True).wait()
Here's the call to the function:
import blastn
from process_blast_output import *
from remove_false_sequences import *
import os
directory = '/some/path/'
if __name__ == '__main__':
for file in os.listdir(directory + 'fasta_files'):
if 'btcaA1_trimmed' in file:
blastn.blast(f'{file}') # That's where the function is called
dataframe = get_dataframe(directory + f'blast/{file[:-2]}txt')
dataframe = get_taxonomic_data(dataframe)
delete_false_hits(fasta_to_dictionary(dataframe), directory + f'fasta_files/{file[:-2]}fa')
Instead of passing a string I also tried passing a list:
subprocess.Popen(['blastn', '-query', f'{output_path}fasta_files/{file}', '-db', f'{db_path}', '-max_hsps', '1',
'-max_target_seqs', '40', '-num_threads', '4', '-evalue', '1e-5', '-out',
f'{output_path}blast/{file[:-2]}txt', '-outfmt', "6 qseqid sseqid pident staxids sskingdoms "
"qstart qend qlen length sstart send slen evalue"
" mismatch gapopen bitscore stitle"],
stdout=subprocess.PIPE).wait()
Probably the actual problem is that you were setting stdout=subprocess.PIPE
but then ignoring the output. If you want to discard any output, use stdout=subprocess.DEVNULL
; if you want to allow the subprocess to write to standard output normally, just don't set stdout
at all.
Whether you use shell=True
(and a first argument consisting of a single string for the shell to parse) or not (in which case the first argument should be a list of properly tokenized strings) has no bearing on whether the subprocess is waited for.
You should generally avoid Popen
, which does not wait by default. subprocess.run()
and its legacy cousins check_call()
et al. do wait for the external subprocess.
Generally, probably avoid shell=True
if you can.
def blast(file):
subprocess.run(
['blastn', '-query,' f'{output_path}fasta_files/{file}',
'-db', db_path, '-max_hsps', '1', '-max_target_seqs', '40',
'-num_threads', '4', '-evalue', '1e-5 ',
'-out', f'{output_path}blast/{file[:-2]}txt',
'-outfmt' "6 qseqid sseqid pident staxids sskingdoms qstart qend "
"qlen length sstart send slen evalue mismatch gapopen "
"bitscore stitle"],
stdout=subprocess.DEVNULL, check=True)
The subprocess you created will be waited for, but it is of course still possible that it created detached subprocesses of its own, which Python cannot directly wait for if the subprocess hides this from the caller.
As an aside, your if __name__ == '__main__'
code should be trivial; if you put all the useful code in this block, there is no way the file can be useful to import
into another script anyway, and so the whole __name__
check is pointless. The purpose of this is so you can say
def useful_code():
# lots of code here
if __name__ == '__main__':
useful_code()
Now, if you python scriptname.py
, then __name__
will be __main__
and so the call to useful_code()
will be executed immediately. But if you import scriptname
(assuming you have set things up so that you can do this, with a correct sys.path
and so forth) that will not cause useful_code
to be run immediately; instead, the caller decides if and when they actually want to run this function (or some other function from the module, if it contains several).
As a further aside, f'{file}'
is just a really clumsy way to say file
(or str(file)
if the variable wasn't already a string).