Search code examples
pythonpandasjsonb

Converting pandas dataframe to JSON Object Column


I have a pandas dataframe that has information about a user with multiple orders and within each order there are multiple items purchases. An example of the dataframe format:

user_id | order_num | item_id | item_desc 
    1        1         1         red
    1        1         2         blue
    1        1         3         green

I want to convert it to JSONb Object in a column so that I can query it in postgresql. Currently I am using the following code:

j = (reg_test.groupby(['user_id', 'order_num'], as_index=False)
             .apply(lambda x: x[['item_id','item_desc']].to_dict('r'))
             .reset_index()
             .rename(columns={0:'New-Data'})
             .to_json(orient='records'))

This is the result I am getting:

'''
[
          {
            "New-Data": [
              {
                "item_id": "1",
                "item_desc": "red",
              },
              {
                "item_id": "2",
                "item_desc": "blue",
              },
              {
                "item_id": "3",
                "item_desc": "green",
              }
            ],
            "order_number": "1",
            "user_id": "1"
          }
        ]
'''

While that is correct json format, I want the result to look like this:

'''

[
  {
    "New-Data": [{
"1":
      {
        "item_id": "1",
        "item_desc": "red",
      },
"2": {
        "item_id": "2",
        "item_desc": "blue",
      },
"3":
      {
        "item_id": "3",
        "item_desc": "green",
      }
     }
    ],
    "order_number": "1",
    "user_id": "1"
  }
]
'''

Solution

  • Have you considered to use a custom function

    import pandas as pd
    
    df = pd.DataFrame({'user_id': {0: 1, 1: 1, 2: 1},
     'order_num': {0: 1, 1: 1, 2: 1},
     'item_id': {0: 1, 1: 2, 2: 3},
     'item_desc': {0: 'red', 1: 'blue', 2: 'green'}})
    
    out = df.groupby(['user_id', 'order_num'])[["item_id", "item_desc"]]\
            .apply(lambda x: x.to_dict("records"))\
            .apply(lambda x: [{str(l["item_id"]):l for l in x}])\
            .reset_index(name="New-Data")\
            .to_dict("records")
    

    where out returns

    [{'user_id': 1,
      'order_num': 1,
      'New-Data': [{'1': {'item_id': 1, 'item_desc': 'red'},
        '2': {'item_id': 2, 'item_desc': 'blue'},
        '3': {'item_id': 3, 'item_desc': 'green'}}]}]