Search code examples
pythonjsonpandasapi

Parse JSON from Api to pandas dataframe with doubled names


I'm parsing a json and I don't understand how to correctly decompose it into a dataframe.

Json structure i have (api response):

{
  "result": {
    "data": [],
    "totals": [
      0
    ]
  },
  "timestamp": "2021-11-25 15:19:21"
}

response_ =

{
   "result":{
      "data":[
         {
            "dimensions":[
               {
                  "id":"2023-01-10",
                  "name":""
               },
               {
                  "id":"123",
                  "name":"good3"
               }
            ],
            "metrics":[
               10,
               20,
               30,
               40
            ]
         },
         {
            "dimensions":[
               {
                  "id":"2023-01-10",
                  "name":""
               },
               {
                  "id":"234",
                  "name":"good2"
               }
            ],
            "metrics":[
               1,
               2,
               3,
               4
            ]
         }
      ],
      "totals":[
         11,
         22,
         33,
         44
      ]
   },
   "timestamp":"2023-02-07 12:58:40"
}

I don't need "timestamp" and "totals" - just "data". So i do:

...
response_ = requests.post(url, headers=head, data=body)
datas = response_.json()
datas_ = datas['result']['data']
df1 = pd.json_normalize(datas_)

I got:

dimensions metrics
0 [{'id': '2023-01-10', 'name': ''}, {'id': '123', 'name': 'good1'}] [10, 20, 30, 40]
1 [{'id': '2023-01-10', 'name': ''}, {'id': '234', 'name': 'good2'}] [1, 2, 3, 4]

But i need dataframe like:

id_ name_ id name metric1 metric2 metric3 metric4
0 2023-01-10 123 good1 10 20 30 40
1 2023-01-10 234 good2 1 2 3 4

When i try like:

df1 = pd.json_normalize(datas_, 'dimensions')

i get all id's and name's in one column.

Explain step by step if possible. Thank you.


Solution

  • Try:

    response = {
        "result": {
            "data": [
                {
                    "dimensions": [
                        {"id": "2023-01-10", "name": ""},
                        {"id": "123", "name": "good3"},
                    ],
                    "metrics": [10, 20, 30, 40],
                },
                {
                    "dimensions": [
                        {"id": "2023-01-10", "name": ""},
                        {"id": "234", "name": "good2"},
                    ],
                    "metrics": [1, 2, 3, 4],
                },
            ],
            "totals": [11, 22, 33, 44],
        },
        "timestamp": "2023-02-07 12:58:40",
    }
    
    tmp = [
        {
            **{f"{k}_": v for k, v in d["dimensions"][0].items()},
            **{k: v for k, v in d["dimensions"][1].items()},
            **{f'metric{i}':m for i, m in enumerate(d['metrics'], 1)}
        }
        for d in response["result"]["data"]
    ]
    
    df = pd.DataFrame(tmp)
    print(df)
    

    Prints:

              id_ name_   id   name  metric1  metric2  metric3  metric4
    0  2023-01-10        123  good3       10       20       30       40
    1  2023-01-10        234  good2        1        2        3        4