Search code examples
pythonjsonpandascsvexport-to-csv

Convert JSON to CSV with complex arrays in Python


I have several JSON files with nested data. Utilizing Python, I was able to use pandas to help with that:

import pandas as pd

df = pd.read_json (r'data.json')
export_csv = df.to_csv (r'data.csv', index = None, header=True)

However, this only works for simple JSON files. The ones I have are complex with nested arrays and some of the JSON data is merged under the columns. For example, if we're going to use this sample data:

data.json

[
  {
    "id": 1,
    "name": {
      "english": "Bulbasaur",
      "french": "Bulbizarre"
    },
    "type": [
      "Grass",
      "Poison"
    ],
    "base": {
      "HP": 45,
      "Attack": 49,
      "Defense": 49
    }
  },
  {
    "id": 2,
    "name": {
      "english": "Ivysaur",
      "french": "Herbizarre"
    },
    "type": [
      "Grass",
      "Poison"
    ],
    "base": {
      "HP": 60,
      "Attack": 62,
      "Defense": 63
    }
  }
]

The result ends up like the following:

CSV Output

You can see that any array past the first level is showing it in JSON (e.g. {'english': 'Bulbasaur', 'french': 'Bulbizarre'}). Ideally, it should break those child arrays into a column with the name of the element:

Expected Output

On top of that, the other JSON files have different element names and order. Therefore, the script should catch all of the different element names and then convert them into CSV columns.

How can I achieve this?


Solution

  • check out flatten_json

    from flatten_json import flatten
    dic = [
      {
        "id": 1,
        "name": {
          "english": "Bulbasaur",
          "french": "Bulbizarre"
        },
        "type": [
          "Grass",
          "Poison"
        ],
        "base": {
          "HP": 45,
          "Attack": 49,
          "Defense": 49
        }
      },
      {
        "id": 2,
        "name": {
          "english": "Ivysaur",
          "french": "Herbizarre"
        },
        "type": [
          "Grass",
          "Poison"
        ],
        "base": {
          "HP": 60,
          "Attack": 62,
          "Defense": 63
        }
      }
    ]
    
    dic_flattened = (flatten(d, '.') for d in dic)
    df = pd.DataFrame(dic_flattened)
    

    Output:

       id name.english name.french type.0  type.1  base.HP  base.Attack  base.Defense
    0   1    Bulbasaur  Bulbizarre  Grass  Poison       45           49            49
    1   2      Ivysaur  Herbizarre  Grass  Poison       60           62            63