python python-3.x dictionary multiprocessing python-multiprocessing

Modifying outter dict inside a multiprocessing pool

I'm trying to modify a dictionary (file) with a multiprocessing pool. However, I can't make it happen.

Here is what I'm trying:

import json
import multiprocessing



def teste1(_dict, _iterable):
    file1[f'{_iterable}'] = {'relevant': True}


file1 = {'item1': {'relevant': False}, 'item2': {'relevant': False}}

pool = multiprocessing.Pool(4)
manager = multiprocessing.Manager()
dicto = manager.dict()
pool.apply_async(teste1, (file1, file1))
print(file1)

However, it's still printing out the same as before: {'item1': {'relevant': False}, 'item2': {'relevant': False}}

Could one noble soul help me out with this?

Solution

There are multiple issues with your approach:

You are attempting to share a dictionary (file1) across a number of processes but you are actually sharing a copy of it. In order to solve this please refer to: multiprocessing: How do I share a dict among multiple processes?
You are iterating over the copies of the dictionaries. Trying to index with a dictionary itself!

Assuming that what you need is a dictionary with values updated by parallel processes, you have two choices:

A. Share the dictionary across processes and iterate over keys like:

pool.apply_async(teste1, file1.keys())  # assuming file1 is properly shared

B. Simpler approach where you construct the resulting dictionary based on the return values from parallel run teste1 function:

def teste1(dict_key):
    # some logic dependent on dict_key
    return {'relevant': True}


file1 = {'item1': {'relevant': False}, 'item2': {'relevant': False}}

pool = multiprocessing.Pool(4)
manager = multiprocessing.Manager()
dicto = manager.dict()
results = pool.map(teste1, file1.keys())
pool.close()
pool.join()

file2 = {k:v for k,v in zip(file1.keys(), results)}  # file1.keys() preserves the order so results and file1.keys() are corresponding
print(file2)