Search code examples
pythonjsonpandasdataframenormalize

How to flatten a JSON to a wide format, in pandas


I have a JSON file

response ={
  "classifier_id": "xxxxx-xx-1",
  "url": "/testers/xxxxx-xx-1",
  "collection": [
    {
      "text": "How hot will it be today?",
      "top_class": "temperature",
      "classes": [
        {
          "class_name": "temperature",
          "confidence": 0.993
        },
        {
          "class_name": "conditions",
          "confidence": 0.006
        }
      ]
    },
    {
      "text": "Is it hot outside?",
      "top_class": "temperature",
      "classes": [
        {
          "class_name": "temperature",
          "confidence": 1.0
        },
        {
          "class_name": "conditions",
          "confidence": 0.0
        }
      ]
    }
  ]
}

Current Output

enter image description here

Code and undesired output

enter image description here

I tried json_normalize, however, it's giving duplicates.

How can I convert this Jason file to Pandas DataFrame?

The records for each collection should be expanded wide, not long.

result: DataFrameImage


Solution

  • df = pd.DataFrame([flatten_json(x) for x in response['collection']])
    
    # display(df)
                            text    top_class classes_0_class_name  classes_0_confidence classes_1_class_name  classes_1_confidence
    0  How hot will it be today?  temperature          temperature                 0.993           conditions                 0.006
    1         Is it hot outside?  temperature          temperature                 1.000           conditions                 0.000