Search code examples
pythondataframemulti-indexnested-json

Nested JSON in customised format from pandas Dataframe, with added label


Dataframe

df = {"UNIT":["UNIT1","UNIT1","UNIT2","UNIT2"],
"PROJECT":["A","A","C","C"],
"TEAM":[1,2,1,2],
"NAME":["FANNY", "KATY", "PERCY", "PETER"],
"ID":[123,234,333,222]}
data = pd.DataFrame(df)

    UNIT PROJECT  TEAM   NAME   ID
0  UNIT1       A     1  FANNY  123
1  UNIT1       A     2   KATY  234
2  UNIT2       C     1  PERCY  333
3  UNIT2       C     2  PETER  222

Expected output

[
    {
        "UNIT": "UNIT1",
        "PROJECT": "A",
        "TEAM_DETAIL": [
            {
                "TEAM": 1,
                "MEMBER": [
                    {
                        "NAME": "FANNY",
                        "ID": 123
                    }
                ]
            },
            {
                "TEAM": "TEAM 2",
                "MEMBER": [
                    {
                        "NAME": "KATY",
                        "ID": 234
                    }
                ]
            }
        ]
    },
    {
        "UNIT": "UNIT2",
        "PROJECT": "C",
        "TEAM_DETAIL": [
            {
                "TEAM": 1,
                "MEMBER": [
                    {
                        "NAME": "PERCY",
                        "ID": 333
                    }
                ]
            },
            {
                "TEAM": "TEAM 2",
                "MEMBER": [
                    {
                        "NAME": "PETER",
                        "ID": 222
                    }
                ]
            }
        ]
    }
]

In this situation I would like to group the data by TEAM and hence showing each of the member details in each team. Without adding customised label eg.TEAM_DETAIL and MEMBER, it can be easily achieved by using .to_dict() However, I have no idea how to add a label on each level.


Solution

  • You have to create the MEMBER list with the first groupby. Then you can use a second groupby to create the TEAM_DETAIL list.

    Full code:

    import pandas as pd
    
    data = {"UNIT":["UNIT1","UNIT1","UNIT2","UNIT2"],
    "PROJECT":["A","A","C","C"],
    "TEAM":[1,2,1,2],
    "NAME":["FANNY", "KATY", "PERCY", "PETER"],
    "ID":[123,234,333,222]}
    df = pd.DataFrame(data)
    df
    
    json = (df.groupby(['UNIT','PROJECT', 'TEAM'])
           .apply(lambda x: x[['NAME','ID']].to_dict('records'))
           .reset_index()
           .rename(columns={0:'MEMBER'})
           .groupby(['UNIT','PROJECT'])
           .apply(lambda x: x[['TEAM','MEMBER']].to_dict('records'))
           .reset_index()
           .rename(columns={0:'TEAM_DETAIL'})
           .to_json(orient='records'))
         
    print(json)
    
    

    Output:

    '[{"UNIT":"UNIT1","PROJECT":"A","TEAM_DETAIL":[{"TEAM":1,"MEMBER":[{"NAME":"FANNY","ID":123}]},{"TEAM":2,"MEMBER":[{"NAME":"KATY","ID":234}]}]},{"UNIT":"UNIT2","PROJECT":"C","TEAM_DETAIL":[{"TEAM":1,"MEMBER":[{"NAME":"PERCY","ID":333}]},{"TEAM":2,"MEMBER":[{"NAME":"PETER","ID":222}]}]}]'