Search code examples
pythonpython-3.xdictionaryordereddictionary

Multi-level dictionary structure from DataFrame


I want to create a multi-layer dictionary from a DataFrame, in a specific data format.

Generic Input, with Date as string (immutable):

cleaned['Date'] = cleaned['Date'].astype(str)
cleaned.tail(3)

With the corresponding output:

Date          Name1_attribute1 Name1_attribute2 Name2_attribute1 Name2_attribute2
29/06/2020    11.04            97.30            19.67            94.28  
30/06/2020    11.05            97.38            19.68            94.31  
01/07/2020    11.06            97.46            19.61            93.95

I am trying to obtain the below dictionary structure (for many more rows and columns):

{Name_1:{
    29/06/2020:{
    'Fixed String Attribute 1' : 11.04,
    'Second Fixed String Attribute 2' : 97.30},
    30/06/2020:{
    'Fixed String Attribute 1' : 11.05,
    'Second Fixed String Attribute 2' : 97.38},
    01/07/2020:{
    'Fixed String Attribute 1' : 11.06,
    'Second Fixed String Attribute 2' : 97.46}},
 {Name_2:{
    29/06/2020:{
    'Fixed String Attribute 1' : 19.67,
    'Second Fixed String Attribute 2' : 94.28},
    30/06/2020:{
    'Fixed String Attribute 1' : 19.68,
    'Second Fixed String Attribute 2' : 94.31},
    01/07/2020:{
    'Fixed String Attribute 1' : 19.61,
    'Second Fixed String Attribute 2' : 93.95}},  
  }

Having consulted the docs for DataFrame.to_dict, Ordereddict and SO, I was unable to find any similar questions.

Any advice on achieving the desired output is greatly appreciated!


Solution

  • You can try out something like this:

    d = {}
    df.set_index('Date', inplace=True)
    data = df.T
    grp = data.groupby(data.index.str[:5])
    
    for i in grp.groups:
        d[i] = grp.get_group(i).to_dict()
    

    d:

    {'Name1': {'29/06/2020': {'Name1_attribute1': 11.04, 'Name1_attribute2': 97.3},
      '30/06/2020': {'Name1_attribute1': 11.05, 'Name1_attribute2': 97.38},
      '01/07/2020': {'Name1_attribute1': 11.06, 'Name1_attribute2': 97.46}},
     'Name2': {'29/06/2020': {'Name2_attribute1': 19.67,
       'Name2_attribute2': 94.28},
      '30/06/2020': {'Name2_attribute1': 19.68, 'Name2_attribute2': 94.31},
      '01/07/2020': {'Name2_attribute1': 19.61, 'Name2_attribute2': 93.95}}}
    

    Then rename them.