I'm trying to use the multiprocessing.Pool
object to run some database queries in parallel. I'm using MySQLdb.
I have some module-level functions where I define queries to run, like this:
def check_foo(cursor, table):
query = "(some query)"
cursor.execute(query)
results = cursor.fetchall()
return len(results) == 0
These functions are collected when the program is run, like this:
if __name__ == '__main__':
check_functions = [v for k, v in globals().items()
if k.startswith('check_') and callable(v)]
I also have a module-level function that runs a particular check function on a list of tables:
def run_check_on_all((tables, cursor, f)):
return [f(cursor, table) for table in tables]
I want to have one worker process for each check function that will call run_check_on_all
for that function. Here's my attempt to do that:
if __name__ == '__main__':
...
pool = multiprocessing.Pool(len(check_functions))
cursors = [conn.cursor() for i in range(len(check_functions))]
print "Running {0} check(s)...".format(len(check_functions))
table_lists = [table_list] * len(check_functions)
all_results = pool.map(run_check_on_all, zip(table_lists, cursors, check_functions))
When I attempt to run this, I get the following error:
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/local/Python2.6/lib/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "/usr/local/Python2.6/lib/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/local/Python2.6/lib/python2.6/multiprocessing/pool.py", line 225, in _handle_tasks
put(task)
PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed
As you can (hopefully) see, nothing involved in the call to pool.map
is an instance method. run_check_on_all
and each of the check_functions
are module-level functions. table_lists
is a list of lists of strings. cursors
is a list of MySQLdb cursor objects.
I thought maybe it had to do with calling the cursor objects' instance methods in the check functions, but I replaced them with dummy functions like this
def check_foo(cursor, table):
print "hello"
and still no luck.
Where is the instance method that the error is referring to?
The problem is that you attempt to pass database cursor objects between processes. Each process must create a connection to the database, and create a cursor on that connection.