Search code examples
pythonpython-2.7profilingmultiprocessingshelve

Linux Time Command causes Exception for Multi-context With Statement


Background

I am using Python 2.7.6 to parse chunks of very large files (20+ GB) in parallel with the multiprocessing module. I have the worker processes extract information from the input file and put the results in a shelved dictionary for later processing. To prevent simultaneous writes to the pseudo-database, I am using a managed lock. I have also implemented a context manager for the database access to ensure it is always closed, because the shelve module doesn't natively support context manager functionality until Python 3.4.

The Problem

I would like to measure overall run time with the Linux time command. However, when I run the script with the time command, I get a SyntaxError exception that I don't get if I run it normally. Example code:

import multiprocessing
import shelve
from contextlib import contextmanager

DB_NAME = 'temp_db'

# manually implemented context manager - not natively implemented until Python 3.4
# I could use contextlib.closing, but this method makes the "with" statements cleaner
@contextmanager
def open_db(db_name, flag='c'):
    db = shelve.open(db_name, flag=flag)
    try:
        yield db
    finally:
        db.close()

db_lock = multiprocessing.Manager().Lock()

with db_lock, open_db(DB_NAME) as db:
    db['1'] = 'test_value1'
    db['2'] = 1.5

with db_lock, open_db(DB_NAME) as db:
    for key, val in db.iteritems():
        print("{0} : {1}\n".format(key, val))

Running python test_script.py produces the expected output:

2 : 1.5

1 : test_value1

On the other hand, running time python test_script.py causes an exception:

  File "test_script.py", line 21
    with db_lock, open_db(DB_NAME) as db:
                        ^
SyntaxError: invalid syntax
0.005u 0.002s 0:00.01 0.0%      0+0k 0+0io 0pf+0w

The Question

Why would the time command affect what the interpreter considers valid syntax?

Other Notes

  1. I assume the time command is being invoked correctly because it does produce the timing information, and the presence of the exception shows that the interpreter is finding the correct script.
  2. If I eliminate either the acquisition of the lock or the database opening, the exception disappears, so the problem appears to be caused by the comma in the with statement.

Solution

  • Something is causing the python executable (and version) to change. Try these commands:

    which python
    python -V
    
    time which python
    time python -V
    

    For the overall project, consider having each worker just return data to the parent, which then stores info in a file or database. This simplifies the code because you don't need locking -- only the parent has access.