Search code examples
pythonsubprocesspopenblast

Subprocess.call() in a function doesn't pause the script that calls the function


I looked and found solutions, tried them and got the same result. I tried using Popen.wait(), run() and call(). As suggested by other users, I also tried passing the command as a list of strings. Didn't work. The subprocess call doesn't give an error, so that's not the issue.

Here's the function:

def blast(file):
    command = f'blastn -query {output_path}fasta_files/{file} -db {db_path} -max_hsps 1 -max_target_seqs 40 -num_threads 4 -evalue 1e-5 ' \
              f'-out {output_path}blast/{file[:-2]}txt -outfmt "6 qseqid sseqid pident staxids sskingdoms qstart qend ' \
              f'qlen length sstart send slen evalue mismatch gapopen bitscore stitle"'
    subprocess.Popen(command, stdout=subprocess.PIPE, shell=True).wait()

Here's the call to the function:

import blastn
from process_blast_output import *
from remove_false_sequences import *
import os

directory = '/some/path/'


if __name__ == '__main__':
    for file in os.listdir(directory + 'fasta_files'):
        if 'btcaA1_trimmed' in file:
            blastn.blast(f'{file}') # That's where the function is called
            dataframe = get_dataframe(directory + f'blast/{file[:-2]}txt')
            dataframe = get_taxonomic_data(dataframe)
            delete_false_hits(fasta_to_dictionary(dataframe), directory + f'fasta_files/{file[:-2]}fa')

Instead of passing a string I also tried passing a list:

subprocess.Popen(['blastn', '-query', f'{output_path}fasta_files/{file}', '-db', f'{db_path}', '-max_hsps', '1',
                  '-max_target_seqs', '40', '-num_threads', '4', '-evalue', '1e-5', '-out',
                  f'{output_path}blast/{file[:-2]}txt', '-outfmt', "6 qseqid sseqid pident staxids sskingdoms "
                                                                   "qstart qend qlen length sstart send slen evalue"
                                                                   " mismatch gapopen bitscore stitle"],
                 stdout=subprocess.PIPE).wait()

Solution

  • Probably the actual problem is that you were setting stdout=subprocess.PIPE but then ignoring the output. If you want to discard any output, use stdout=subprocess.DEVNULL; if you want to allow the subprocess to write to standard output normally, just don't set stdout at all.

    Whether you use shell=True (and a first argument consisting of a single string for the shell to parse) or not (in which case the first argument should be a list of properly tokenized strings) has no bearing on whether the subprocess is waited for.

    You should generally avoid Popen, which does not wait by default. subprocess.run() and its legacy cousins check_call() et al. do wait for the external subprocess.

    Generally, probably avoid shell=True if you can.

    def blast(file):
        subprocess.run(
            ['blastn', '-query,' f'{output_path}fasta_files/{file}',
              '-db', db_path, '-max_hsps', '1', '-max_target_seqs', '40',
              '-num_threads', '4', '-evalue', '1e-5 ',
              '-out', f'{output_path}blast/{file[:-2]}txt',
              '-outfmt' "6 qseqid sseqid pident staxids sskingdoms qstart qend "
                        "qlen length sstart send slen evalue mismatch gapopen "
                        "bitscore stitle"],
        stdout=subprocess.DEVNULL, check=True)
    

    The subprocess you created will be waited for, but it is of course still possible that it created detached subprocesses of its own, which Python cannot directly wait for if the subprocess hides this from the caller.

    As an aside, your if __name__ == '__main__' code should be trivial; if you put all the useful code in this block, there is no way the file can be useful to import into another script anyway, and so the whole __name__ check is pointless. The purpose of this is so you can say

    def useful_code():
        # lots of code here
    
    if __name__ == '__main__':
        useful_code()
    

    Now, if you python scriptname.py, then __name__ will be __main__ and so the call to useful_code() will be executed immediately. But if you import scriptname (assuming you have set things up so that you can do this, with a correct sys.path and so forth) that will not cause useful_code to be run immediately; instead, the caller decides if and when they actually want to run this function (or some other function from the module, if it contains several).

    As a further aside, f'{file}' is just a really clumsy way to say file (or str(file) if the variable wasn't already a string).