Search code examples
pythonpandasdataframedata-sciencemulti-index

Converting a multindex dataframe to a nested dictionary


I have a grouped dataframe as shown in this link:

I want to convert it into a nested dictionary, where 'Dia' is the main key and inside contains another dictionary where the keys are the 'mac_ap' and the values are another dictionary where the key would be 'download' and 'upload' and the values would be the corresponding values to column 'bytes'

something like this:


Solution

  • Suppose this is your dataframe:

    df = pd.DataFrame([['2010-12-06', 'MAC_AP_1', 'download', 1], 
                        ['2010-12-06', 'MAC_AP_1', 'upload', 2],
                        ['2010-12-06', 'MAC_AP_2', 'download', 3],
                        ['2010-12-06', 'MAC_AP_2', 'upload', 4], 
                        ['2020-01-01', 'MAC_AP_3', 'download', 5],
                        ['2020-01-01', 'MAC_AP_3', 'upload', 6],
                        ['2020-01-01', 'MAC_AP_4', 'download', 7],
                        ['2020-01-01', 'MAC_AP_4', 'upload', 8]]
                        , columns=['Dia', 'macap', 'transmission', 'bytes'])
    
        Dia         macap       transmission    bytes
    0   2010-12-06  MAC_AP_1    download    1
    1   2010-12-06  MAC_AP_1    upload  2
    2   2010-12-06  MAC_AP_2    download    3
    3   2010-12-06  MAC_AP_2    upload  4
    4   2020-01-01  MAC_AP_3    download    5
    5   2020-01-01  MAC_AP_3    upload  6
    6   2020-01-01  MAC_AP_4    download    7
    7   2020-01-01  MAC_AP_4    upload  
    

    You need to create a nested dictionary out of your dataframe. So you should groupby your dataframe columns recursively till you reach the branches:

    d = df.groupby('Dia').apply(lambda a: dict(a.groupby('macap').apply(lambda x: dict(zip(x['transmission'], x['bytes'])))))
    d = d.to_dict()
    

    You first groupby on 'Dia' and then apply another groupby on the nested 'macap'. The last apply is used for making transmission and bytes a tuple and then converting them to a dictionay.

    As you have 3 levels of nested dictionaries, you can see in the code that there are 3 conversions to dictionaries too.

    The result would then be this:

    {'2010-12-06': {'MAC_AP_1': {'download': 1, 'upload': 2}, 
                    'MAC_AP_2': {'download': 3, 'upload': 4}}, 
    '2020-01-01': {'MAC_AP_3': {'download': 5, 'upload': 6}, 
                   'MAC_AP_4': {'download': 7, 'upload': 8}}}