I'm trying to modify a dictionary (file) with a multiprocessing pool. However, I can't make it happen.
Here is what I'm trying:
import json
import multiprocessing
def teste1(_dict, _iterable):
file1[f'{_iterable}'] = {'relevant': True}
file1 = {'item1': {'relevant': False}, 'item2': {'relevant': False}}
pool = multiprocessing.Pool(4)
manager = multiprocessing.Manager()
dicto = manager.dict()
pool.apply_async(teste1, (file1, file1))
print(file1)
However, it's still printing out the same as before: {'item1': {'relevant': False}, 'item2': {'relevant': False}}
Could one noble soul help me out with this?
There are multiple issues with your approach:
You are attempting to share a dictionary (file1
) across a number of processes but you are actually sharing a copy of it. In order to solve this please refer to: multiprocessing: How do I share a dict among multiple processes?
You are iterating over the copies of the dictionaries. Trying to index with a dictionary itself!
Assuming that what you need is a dictionary with values updated by parallel processes, you have two choices:
A. Share the dictionary across processes and iterate over keys like:
pool.apply_async(teste1, file1.keys()) # assuming file1 is properly shared
B. Simpler approach where you construct the resulting dictionary based on the return values from parallel run teste1
function:
def teste1(dict_key):
# some logic dependent on dict_key
return {'relevant': True}
file1 = {'item1': {'relevant': False}, 'item2': {'relevant': False}}
pool = multiprocessing.Pool(4)
manager = multiprocessing.Manager()
dicto = manager.dict()
results = pool.map(teste1, file1.keys())
pool.close()
pool.join()
file2 = {k:v for k,v in zip(file1.keys(), results)} # file1.keys() preserves the order so results and file1.keys() are corresponding
print(file2)