Search code examples
pythonlistdictionarynestedmedian

I need column wise median of list_of_dicts_of_lists in python


I have this data:

list_of_dicts_of_lists = [
    {'a': [1,2], 'b': [3,4], 'c': [3,2], 'd': [2,5]}
    {'a': [2,2], 'b': [2,2], 'c': [1,6], 'd': [4,7]}
    {'a': [2,2], 'b': [5,2], 'c': [3,2], 'd': [2,2]}
    {'a': [1,2], 'b': [3,4], 'c': [1,6], 'd': [5,5]} 
    ]

I need this result:

median_dict_of_lists = (
    {'a': [1.5,2], 'b': [3,3], 'c': [2,4], 'd': [3,5]}
    )

...where each value is the median of the respective column above.

I need the mode dictionary where available and median dictionary when no mode exists. I was able to do quick and dirty statistics.mode() by stringing each dict, getting mode of list of strings, then ast.literal_eval(most_common_string) back to a dict, but I need a column wise median in cases where there is no mode.

I know how to use statistics.median(); however, the nested notation to apply it to this case, column wise, is frazzling me.

The data is all floats; I wrote it as int just to make easier to read.


Solution

  • You can use the following dictionary comprehension with numpy:

    import numpy as np
    median_dict_of_lists = {i : list(np.median([x[i] for x in list_of_dicts_of_lists], axis=0)) 
                        for i in 'abcd'}
    

    Which returns the same:

    {'a': [1.5, 2.0], 'c': [2.0, 4.0], 'b': [3.0, 3.0], 'd': [3.0, 5.0]}
    

    To explain, np.median([x[i] for x in list_of_dicts_of_lists], axis=0), embedded in the dictionary comprehension, is going through each key i in ['a', 'b', 'c', 'd'], and getting the median of each key for all of your dicts in your original list of dicts. This median is getting assigned to a new dictionary with the appropriate key via the dictionary comprehension syntax.

    There is a good explanation of the dictionary comprehension syntax here, and the documentation for np.median explains the function itself quite well