Search code examples
pythontemporary-filesblast

NamedTemporaryFile exists, but external program can't access it


This is a follow-up of sorts to this question about using NamedTemporaryFile()

I have a function that creates and writes to a temporary file. I then want to use that file in a different function, which calls a terminal command that uses that file (the program is from the Blast+ suite, blastn).

def db_cds_to_fna(collection="genes"):  # collection gets data from mongoDB

    tmp_file = NamedTemporaryFile()
    # write stuff to file

    return tmp_file

def blast_all(blast_db, collection="genes"):        

    tmp_results = NamedTemporaryFile()    
    db_fna = db_cds_to_fna(collection) # should return another file object

    Popen(
        ['blastn',
         '-query', db_fna.name,
         '-db', blast_db,
         '-out', tmp_results.name,
         '-outfmt', '5']  # xml output
    )

    return tmp_results

When I call blast_all, I get an error from the blastn command:

Command line argument error: Argument "query". File is not accessible:  `/var/folders/mv/w3flyjvn7vnbllysvzrf9y480000gn/T/tmpAJVWoz'

But, just prior to the Popen call, if I do os.path.isfile(db_fna.name) it evaluates to True. I can also do

print Popen(['head', db_fna.name]).communicate(0)

And it properly spits out the first lines of the file. So the file exists, and it's readable. Further, I use the same strategy to call a different program from the same blast+ suite (makeblastdb, see question linked at the top) and it works. Is there possibly some problem with permissions? FWIW blastn returns the same error if the file doesn't exist, but it seems clear that I'm correctly creating the file and it's readable when I make the Popen call, so I'm stumped.


Solution

  • I believe I figured out the things conspiring to cause this behavior. First, the Popen() function does not normally wait until the external command finishes before proceeding past it. Second, because as user glibdud mentioned in his answer to my other question, NamedTemporaryFile acts like TemporaryFile in that

    It will be destroyed as soon as it is closed (including an implicit close when the object is garbage collected).

    Since the end of my blast_all() function does not return the query temp file, it gets closed and garbage collected while the external blastn command is running, so the file is deleted. I'm guessing that the external head command goes so quickly it doesn't encounter this problem, but blastn can take up to a couple of minutes to run.

    So the solution is to force Popen() to wait:

    Popen(
        ['blastn',
         '-query', db_fna.name,
         '-db', blast_db,
         '-out', tmp_results.name,
         '-outfmt', '5']  # xml output
    ).wait()