I have this data:
list_of_dicts_of_lists = [
{'a': [1,2], 'b': [3,4], 'c': [3,2], 'd': [2,5]}
{'a': [2,2], 'b': [2,2], 'c': [1,6], 'd': [4,7]}
{'a': [2,2], 'b': [5,2], 'c': [3,2], 'd': [2,2]}
{'a': [1,2], 'b': [3,4], 'c': [1,6], 'd': [5,5]}
]
I need this result:
median_dict_of_lists = (
{'a': [1.5,2], 'b': [3,3], 'c': [2,4], 'd': [3,5]}
)
...where each value is the median of the respective column above.
I need the mode dictionary where available and median dictionary when no mode exists. I was able to do quick and dirty statistics.mode()
by stringing each dict, getting mode of list of strings, then ast.literal_eval(most_common_string)
back to a dict, but I need a column wise median in cases where there is no mode.
I know how to use statistics.median()
; however, the nested notation to apply it to this case, column wise, is frazzling me.
The data is all floats; I wrote it as int just to make easier to read.
You can use the following dictionary comprehension with numpy
:
import numpy as np
median_dict_of_lists = {i : list(np.median([x[i] for x in list_of_dicts_of_lists], axis=0))
for i in 'abcd'}
Which returns the same:
{'a': [1.5, 2.0], 'c': [2.0, 4.0], 'b': [3.0, 3.0], 'd': [3.0, 5.0]}
To explain, np.median([x[i] for x in list_of_dicts_of_lists], axis=0)
, embedded in the dictionary comprehension, is going through each key i
in ['a', 'b', 'c', 'd']
, and getting the median of each key for all of your dicts in your original list of dicts. This median is getting assigned to a new dictionary with the appropriate key via the dictionary comprehension syntax.
There is a good explanation of the dictionary comprehension syntax here, and the documentation for np.median explains the function itself quite well