I am using Python 2.7.6 to parse chunks of very large files (20+ GB) in parallel with the multiprocessing
module. I have the worker processes extract information from the input file and put the results in a shelved dictionary for later processing. To prevent simultaneous writes to the pseudo-database, I am using a managed lock. I have also implemented a context manager for the database access to ensure it is always closed, because the shelve
module doesn't natively support context manager functionality until Python 3.4.
I would like to measure overall run time with the Linux time command. However, when I run the script with the time command, I get a SyntaxError exception that I don't get if I run it normally. Example code:
import multiprocessing
import shelve
from contextlib import contextmanager
DB_NAME = 'temp_db'
# manually implemented context manager - not natively implemented until Python 3.4
# I could use contextlib.closing, but this method makes the "with" statements cleaner
@contextmanager
def open_db(db_name, flag='c'):
db = shelve.open(db_name, flag=flag)
try:
yield db
finally:
db.close()
db_lock = multiprocessing.Manager().Lock()
with db_lock, open_db(DB_NAME) as db:
db['1'] = 'test_value1'
db['2'] = 1.5
with db_lock, open_db(DB_NAME) as db:
for key, val in db.iteritems():
print("{0} : {1}\n".format(key, val))
Running python test_script.py
produces the expected output:
2 : 1.5
1 : test_value1
On the other hand, running time python test_script.py
causes an exception:
File "test_script.py", line 21
with db_lock, open_db(DB_NAME) as db:
^
SyntaxError: invalid syntax
0.005u 0.002s 0:00.01 0.0% 0+0k 0+0io 0pf+0w
Why would the time command affect what the interpreter considers valid syntax?
with
statement.Something is causing the python
executable (and version) to change. Try these commands:
which python
python -V
time which python
time python -V
For the overall project, consider having each worker just return data to the parent, which then stores info in a file or database. This simplifies the code because you don't need locking -- only the parent has access.