I'm processing a list of dictionaries in python like so:
def process_results(list_of_dicts):
first_result, second_result, count = [], [], 0
for dictionary in list_of_dicts:
first_result.append(dictionary)
if 'pi' in dictionary:
second_result.append(dictionary)
count += 1
print second_result, first_result
Next, via this simple SO example of using multiprocessing in a for loop, I'm trying the following (to completely erroneous results):
from multiprocessing import Pool
def process_results(list_of_dicts):
first_result, second_result, count = [], [], 0
for dictionary in list_of_dicts:
first_result.append(dictionary)
if 'pi' in dictionary:
second_result.append(dictionary)
count += 1
return second_result, first_result
if __name__ == '__main__':
list_of_dictionaries = # a list of dictionaries
pool = Pool()
print pool.map(process_results, list_of_dictionaries)
Why is this wrong? An illustrative example would be nice.
What you're probably looking for is this
from multiprocessing import Pool
def process_results(single_dict):
first_result, second_result, count = [], [], 0
first_result.append(single_dict)
if 'pi' in single_dict:
second_result.append(single_dict)
count += 1
return first_result, second_result
if __name__ == '__main__':
lst_dict = [{'a':1, 'b':2, 'c':3},{'c':4, 'pi':3.14}, {'pi':'3.14', 'not pi':8.3143}, {'sin(pi)': 0, 'cos(pi)': 1}];
pool = Pool()
print pool.map(process_results, lst_dict)
pool.map
executes process_results
for each element in the iterable lst_dict
. Since lst_dict
is a list of dictionaries that means that process_results
will be called for every dictionary in lst_dict
using it as an argument. process_results
will be processing every dictionary rather than the whole list.
process_results
in this program is changed accordingly: for a given dictionary in the list, it appends the dictionary to the first_result
list and then appends the dictionary to the second_result
list if the 'pi'
key exist. Result is a list with two sublists - one containing the dictionary and one containing either the copy of the first or an empty list if no 'pi'
was found.
All this can be modified if you for instance need the first_result
and second_result
lists to be shared among processes.
For a better picture of how pool.map()
works look at the first example in the documentation.
To retrieve the results in their original/target form of two lists you can collect the data into a list and then process it:
results = []
results = pool.map(process_results, lst_dict)
first_result = [i[0][0] for i in results]
second_result = [i[0][0] for i in results if i[1]]
results
is a list of tuples. The tuples represent the result of processing of each dictionary - first element is the whole dictionary and the second is either an empty list, or the whole dictionary if 'pi'
key was found. Remaining two lines retrieve that data into first_result
and second_result
lists.