python performance dictionary multiprocessing xlrd

Shared memory dictionary creation too slow using multiprocessing.Manager()

I have a code in which I need to read an excel file and store the information into dictionaries.

I have to use multiprocessing.Manager() to create the dictionaries in order to be able to retrieve calculation output from a function that I run using multiprocess.Process.

The problem is that, when multiprocessing.Manager() and manager.dict() is used to create a dictionary it takes ~400 times longer than using only dict() (and dict() is not a shared memory structure).

Here is a sample code to verify the diference:

import xlrd
import multiprocessing
import time

def DictManager(inp1, inp2):
    manager = multiprocessing.Manager()
    Dict = manager.dict()
    Dict['input1'] = inp1
    Dict['input2'] = inp2
    Dict['Output1'] = None
    Dict['Output2'] = None
    return Dict

def DictNoManager(inp1, inp2):
    Dict = dict()
    Dict['input1'] = inp1
    Dict['input2'] = inp2
    Dict['Output1'] = None
    Dict['Output2'] = None
    return Dict

def ReadFileManager(excelfile):
    DictList = []
    book = xlrd.open_workbook(excelfile)
    sheet = book.sheet_by_index(0)
    line = 2
    for line in range(2,sheet.nrows):
        inp1 = sheet.cell(line,2).value
        inp2 = sheet.cell(line,3).value
        dictionary = DictManager(inp1, inp2)
        DictList.append(dictionary)
    print 'Done!'

def ReadFileNoManager(excelfile):
    DictList = []
    book = xlrd.open_workbook(excelfile)
    sheet = book.sheet_by_index(0)
    line = 2
    for line in range(2,sheet.nrows):
        inp1 = sheet.cell(line,2).value
        inp2 = sheet.cell(line,3).value
        dictionary = DictNoManager(inp1, inp2)
        DictList.append(dictionary)
    print 'Done!'


if __name__ == '__main__':
    excelfile = 'MyFile.xlsx'

    start = time.time()
    ReadFileNoManager(excelfile)
    end = time.time()
    print 'Run time NoManager:', end - start, 's'

    start = time.time()
    ReadFileManager(excelfile)
    end = time.time()
    print 'Run time Manager:', end - start, 's'

Is there a way to improve the performance of multiprocessing.Manager()?

If the answer is No, is there any other shared memory structure that I can use to replace what I am doing and improve performance?

I would appreciate your help!

EDIT:

My main function uses the following code:

def MyFunction(Dictionary, otherdata):
    #Perform calculation and save results in the dictionary
    Dict['Output1'] = Value1
    Dict['Output2'] = Value2

ListOfProcesses = []
for Dict in DictList:
    p = multiprocessing.Process(target=MyFunction, args=(Dict, otherdata)
    p.start()
    ListOfProcesses.append(p)  
for p in ListOfProcesses:
    p.join()

If I do not use the manager, I will not be able to retrieve the Outputs.

Solution

As I mentioned in the comments, I recommend using the main process to read in the excel file. Then using multiprocessing for the function calls. Just add your function to apply_function and make sure it returns whatever you want. results will contain a list of your results.

Update: I changed map to starmap to include your extra argument

def ReadFileNoManager(excelfile):
    DictList = []
    book = xlrd.open_workbook(excelfile)
    sheet = book.sheet_by_index(0)
    line = 2
    for line in range(2,sheet.nrows):
        inp1 = sheet.cell(line,2).value
        inp2 = sheet.cell(line,3).value
        dictionary = DictNoManager(inp1, inp2)
        DictList.append(dictionary)
    print 'Done!'
    return DictList

def apply_function(your_dict, otherdata):
    pass

if __name__ == '__main__':
    excelfile = 'MyFile.xlsx'
    dict_list = ReadFileNoManager(excelfile)    
    pool = multiprocessing.Pool(multiprocessing.cpu_count())
    results = pool.starmap(apply_function, zip(dict_list, repeat(otherdata)))