Search code examples
pythonpandasjson-normalize

How to groupby two columns of a dataframe and convert other columns into dict with column header as key


Dataframe:

id      id_2    salary  title   allowance   name
0420    13.28   100000  director    No      Tom
0420    13.28   70000   developer   Yes     Sam
0110    13.12   120000  director    No      Dave
0110    13.12   75000   developer   Yes     shaun 

Groupby id and id_2 and convert the rest of columns to dict with column header

I wrote a loop for this, I think don't think it is python way, Please let me know how to do it with pandas.

Required output:

[{
            "id": 420,
            "id_2": 13.28,
            "attributes":[
                    {   "salary": 100000,
                        "title":"director",
                        "allowance":"No",
                        "name": "Tom"
                    },
                    {   "salary": 70000,
                        "title": "developer",
                        "allowance":"Yes",
                        "name": "Sam"
                    }
                ]
            },
            {
            "id": 110,
            "id_2": 13.12,
            "attributes":[
                    {   "salary": 120000,
                        "title":"director",
                        "allowance":"No",
                        "name": "Dave"
                    },
                    {   "salary": 75000,
                        "title": "developer",
                        "allowance":"Yes",
                        "name": "shaun"
                    }
                ]
            }   
]

Solution

    • There is not a one-liner pandas argument that will provide a list of dicts in the shape you're requesting.
    • Use .groupby to select the groups
      • g is the tuple that represents the values used to groupby
      • d is the dataframe for the groupby value, g
    • Use .iterrows to iterate through the rows of each group
      • Returns index represented by the first _, because it's not needed
      • Returns data, from which the labels in groupby_list are dropped, then convert the remainder to a dict using .to_dict(), and append it to the list, att_list
      • After looping through all rows of the group, assign att_list as the value to group['attributes']
    • Once each group has been iterated through, append the dict, group, to dict_list.
    • dict_list can be converted back to a dataframe with the following:
      • df = pd.json_normalize(dict_list, 'attributes', meta=groupby_list)
    dict_list = list()
    groupby_list = ['id', 'id_2']
    for g, d in df.groupby(groupby_list):
        group = dict(zip(groupby_list, g))
        att_list = list()
        for _, data in d.iterrows():
            data = data.drop(labels=groupby_list)
            att_list.append(data.to_dict())
        group['attributes'] = att_list
        dict_list.append(group)
    

    dict_list:

    [{'attributes': [{'allowance': 'No',
                      'name': 'Dave',
                      'salary': 120000,
                      'title': 'director'},
                     {'allowance': 'Yes',
                      'name': 'shaun',
                      'salary': 75000,
                      'title': 'developer'}],
      'id': 110,
      'id_2': 13.12},
     {'attributes': [{'allowance': 'No',
                      'name': 'Tom',
                      'salary': 100000,
                      'title': 'director'},
                     {'allowance': 'Yes',
                      'name': 'Sam',
                      'salary': 70000,
                      'title': 'developer'}],
      'id': 420,
      'id_2': 13.28}]