from multiprocessing import Pool
data_table = None
def init_data_table(my_data_table = [], *args):
global data_table
data_table = my_data_table
def process_data(index):
# create data processor object and run cpu intensive task here
return str(index) + " " + data_table[index][0]
def main():
# call db functions once and get data table from db
data_table = ...
pool = Pool(processes = 4, initializer=init_data_table, initargs=(data_table))
x = pool.map(process_data, range(10))
The problem is when I try and pass the data_table
and access it later on, it does not work . I get this error:
IndexError: list index out of range
I'm not sure if this is the correct way of passing a complex data structure such as a tuple or list of lists into the Pool()
function, so it can be accessed by the forked child processes. Essentially it is a shared piece of data that I want to retrieve once only, as it is an expensive call to the db, and I want to make it accessible to the processes.
Any assistance would be greatly appreciated, thanks.
The documentation for multiprocessing.Pool
says this about initializer
:
If initializer is not None then each worker process will call initializer(*initargs) when it starts.
So in your case, it's calling init_data_table(*data_table)
. Because of the *
, it's going to attempt to unpack your list of lists, taking each sublist and assigning it to a variable in the definition of init_data_table
. You've defined it as
def init_data_table(my_data_table=[], *args):
So, when Python tries to unpack this, the first sublist ends up in my_data_table
and all of the rest end up in a tuple, assigned to *args
. To avoid this, you need to put your data_table
into a tuple when you assign it to initargs
. It actually looks like you tried to do this, but you forgot to include the trailing comma:
pool = Pool(processes = 4, initializer=init_data_table, initargs=(data_table,))
Then, Python ends up calling init_data_table(*(data_table,))
, which will unpack your whole data_table
list in my_data_table
, leaving *args
empty, which is what you really want to happen.