Search code examples
pythonpandasdataframemulti-indexnested-json

Pandas MultiIndex dataframe to nested json


I have the following pandas multi-index dataframe and I would like it to become a nested json object.

import pandas as pd

data = {'store_id' : ['1', '1','1','2','2'],
        'item_name' : ['apples', 'oranges', 'pears', 'persimmons', 'bananas'],
        '2022-01-01': [2.33, 1.99, 2.33, 2.33, 4.21],
        '2022-01-02': [2.38, 1.96, 2.38, 2.37, 4.34],
        '2022-01-03': [2.45, 1.78, 2.45, 2.45, 4.13]}
        
df = pd.DataFrame(data).groupby(['store_id', 'item_name']).first()

print(df)

Output:

# The indices are store_id and item_name.

store_id    item_name    2022-01-01 2022-01-02  2022-01-03          
       1       apples       2.33          2.38        2.45
              oranges       1.99          1.96        1.78
                pears       2.33          2.38        2.45
       2      bananas       4.21          4.34        4.13
           persimmons       2.33          2.37        2.45

I would like it to become a json object with nested levels that looks like this:

{
  1: {
    apples: [
      { date: "2022-01-01", price: 2.33 },
      { date: "2022-01-02", price: 2.38 },
      { date: "2022-01-03", price: 2.45 },
    ],
    oranges: [
      { date: "2022-01-01", price: 1.99 },
      { date: "2022-01-02", price: 1.96 },
      { date: "2022-01-03", price: 1.78 },
    ],
    pears: [
      { date: "2022-01-01", price: 2.33 },
      { date: "2022-01-02", price: 2.38 },
      { date: "2022-01-03", price: 2.45 },
    ],
  },

  2: {
    persimmons: [
      { date: "2022-01-01", price: 2.33 },
      { date: "2022-01-02", price: 2.37 },
      { date: "2022-01-03", price: 2.45 },
    ],
    bananas: [
      { date: "2022-01-01", price: 4.21 },
      { date: "2022-01-02", price: 4.34 },
      { date: "2022-01-03", price: 4.13 },
    ],
  },
}

I've tried every permutation of pandas's to_json but nothing gives me the nesting I need.


Solution

  • Use nested list comprehension for add custom format of inner dicts:

    import json
    
    d = {level: {k1: [{'date': k, 'price': v} 
                 for k, v in v1.items()] 
                 for k1, v1 in df.xs(level).T.items()}
                 for level in df.index.levels[0]}
    
    
    j = json.dumps(d)
    print (j)
    

        '1': {
            'apples': [{
                'date': '2022-01-01',
                'price': 2.33
            }, {
                'date': '2022-01-02',
                'price': 2.38
            }, {
                'date': '2022-01-03',
                'price': 2.45
            }],
            'oranges': [{
                'date': '2022-01-01',
                'price': 1.99
            }, {
                'date': '2022-01-02',
                'price': 1.96
            }, {
                'date': '2022-01-03',
                'price': 1.78
            }],
            'pears': [{
                'date': '2022-01-01',
                'price': 2.33
            }, {
                'date': '2022-01-02',
                'price': 2.38
            }, {
                'date': '2022-01-03',
                'price': 2.45
            }]
        },
        '2': {
            'bananas': [{
                'date': '2022-01-01',
                'price': 4.21
            }, {
                'date': '2022-01-02',
                'price': 4.34
            }, {
                'date': '2022-01-03',
                'price': 4.13
            }],
            'persimmons': [{
                'date': '2022-01-01',
                'price': 2.33
            }, {
                'date': '2022-01-02',
                'price': 2.37
            }, {
                'date': '2022-01-03',
                'price': 2.45
            }]
        }
    }