Search code examples
pythondictionarymultiprocessing

populating a large dictionary using multiprocessing


The overall topic of this question is to create a dictionary based on another pre-existing one using multiprocessing. I have a dictionary, called iddic. I want to create another dictionary, finaldic.

Now, here are the specifics of my problem. Here is how my code looks:

import multiprocessing

def create_initial_dictionary():
    #this function creates the large iddic dictionary
    global iddic
    #i write out a template for what iddic looks like for your use
    iddic = {'celia' : 14, 'pierre' : 12, 'picasso' :11, 'pikachu' :19}

def initialize_final_dic():
    global finaldic
    finaldic = {}

def populate_dictionary(names):
    #this function should use iddic to populate finaldic.
    for nam in names:
        finaldic[nam] = 2 * iddic[nam]

if __name__ == "__main__":
    create_initial_dictionary()
    initialize_final_dic()

    list_1 = list(iddic.keys())[:2]
    list_2 = list(iddic.keys())[2:]

    p1 = multiprocessing.Process(target=populate_dictionary, args=(list_1,))
    p2 = multiprocessing.Process(target=populate_dictionary, args=(list_2,))

    p1.start()
    p2.start()

    p1.join()
    p2.join()
    
    print(finaldic)

finaldic should look like this:

finaldic = {'celia' : 28, 'pierre' : 24, 'picasso' :22, 'pikachu' :38}

This is the error I get: NameError: name 'iddic' is not defined


Solution

  • You can do multiplication within separate threads and then merge output dicts in the main thread:

    from multiprocessing.pool import ThreadPool
    
    
    def populate_dictionary(dict_, names):
        for name in names:
            dict_[name] = 2 * dict_[name]
        return dict_
    
    
    input_dic = {'celia': 14, 'pierre': 12, 'picasso': 11, 'pikachu': 19}    
    pool = ThreadPool(processes=2)
    res1 = pool.apply_async(populate_dictionary, (input_dic, list(input_dic.keys())[:2]))
    res2 = pool.apply_async(populate_dictionary, (input_dic, list(input_dic.keys())[2:]))
    
    final_dic = res1.get() | res2.get()
    

    Output:

    {'celia': 28, 'pierre': 24, 'picasso': 22, 'pikachu': 38}