I have a code in which I need to read an excel file and store the information into dictionaries.
I have to use multiprocessing.Manager()
to create the dictionaries in order to be able to retrieve calculation output from a function that I run using multiprocess.Process
.
The problem is that, when multiprocessing.Manager()
and manager.dict()
is used to create a dictionary it takes ~400 times longer than using only dict()
(and dict()
is not a shared memory structure).
Here is a sample code to verify the diference:
import xlrd
import multiprocessing
import time
def DictManager(inp1, inp2):
manager = multiprocessing.Manager()
Dict = manager.dict()
Dict['input1'] = inp1
Dict['input2'] = inp2
Dict['Output1'] = None
Dict['Output2'] = None
return Dict
def DictNoManager(inp1, inp2):
Dict = dict()
Dict['input1'] = inp1
Dict['input2'] = inp2
Dict['Output1'] = None
Dict['Output2'] = None
return Dict
def ReadFileManager(excelfile):
DictList = []
book = xlrd.open_workbook(excelfile)
sheet = book.sheet_by_index(0)
line = 2
for line in range(2,sheet.nrows):
inp1 = sheet.cell(line,2).value
inp2 = sheet.cell(line,3).value
dictionary = DictManager(inp1, inp2)
DictList.append(dictionary)
print 'Done!'
def ReadFileNoManager(excelfile):
DictList = []
book = xlrd.open_workbook(excelfile)
sheet = book.sheet_by_index(0)
line = 2
for line in range(2,sheet.nrows):
inp1 = sheet.cell(line,2).value
inp2 = sheet.cell(line,3).value
dictionary = DictNoManager(inp1, inp2)
DictList.append(dictionary)
print 'Done!'
if __name__ == '__main__':
excelfile = 'MyFile.xlsx'
start = time.time()
ReadFileNoManager(excelfile)
end = time.time()
print 'Run time NoManager:', end - start, 's'
start = time.time()
ReadFileManager(excelfile)
end = time.time()
print 'Run time Manager:', end - start, 's'
Is there a way to improve the performance of multiprocessing.Manager()
?
If the answer is No, is there any other shared memory structure that I can use to replace what I am doing and improve performance?
I would appreciate your help!
EDIT:
My main function uses the following code:
def MyFunction(Dictionary, otherdata):
#Perform calculation and save results in the dictionary
Dict['Output1'] = Value1
Dict['Output2'] = Value2
ListOfProcesses = []
for Dict in DictList:
p = multiprocessing.Process(target=MyFunction, args=(Dict, otherdata)
p.start()
ListOfProcesses.append(p)
for p in ListOfProcesses:
p.join()
If I do not use the manager, I will not be able to retrieve the Outputs.
As I mentioned in the comments, I recommend using the main process to read in the excel file. Then using multiprocessing for the function calls. Just add your function to apply_function
and make sure it returns whatever you want. results
will contain a list of your results.
Update: I changed map to starmap to include your extra argument
def ReadFileNoManager(excelfile):
DictList = []
book = xlrd.open_workbook(excelfile)
sheet = book.sheet_by_index(0)
line = 2
for line in range(2,sheet.nrows):
inp1 = sheet.cell(line,2).value
inp2 = sheet.cell(line,3).value
dictionary = DictNoManager(inp1, inp2)
DictList.append(dictionary)
print 'Done!'
return DictList
def apply_function(your_dict, otherdata):
pass
if __name__ == '__main__':
excelfile = 'MyFile.xlsx'
dict_list = ReadFileNoManager(excelfile)
pool = multiprocessing.Pool(multiprocessing.cpu_count())
results = pool.starmap(apply_function, zip(dict_list, repeat(otherdata)))