I have several JSON files with nested data. Utilizing Python, I was able to use pandas
to help with that:
import pandas as pd
df = pd.read_json (r'data.json')
export_csv = df.to_csv (r'data.csv', index = None, header=True)
However, this only works for simple JSON files. The ones I have are complex with nested arrays and some of the JSON data is merged under the columns. For example, if we're going to use this sample data:
data.json
[
{
"id": 1,
"name": {
"english": "Bulbasaur",
"french": "Bulbizarre"
},
"type": [
"Grass",
"Poison"
],
"base": {
"HP": 45,
"Attack": 49,
"Defense": 49
}
},
{
"id": 2,
"name": {
"english": "Ivysaur",
"french": "Herbizarre"
},
"type": [
"Grass",
"Poison"
],
"base": {
"HP": 60,
"Attack": 62,
"Defense": 63
}
}
]
The result ends up like the following:
You can see that any array past the first level is showing it in JSON (e.g. {'english': 'Bulbasaur', 'french': 'Bulbizarre'}
). Ideally, it should break those child arrays into a column with the name of the element:
On top of that, the other JSON files have different element names and order. Therefore, the script should catch all of the different element names and then convert them into CSV columns.
How can I achieve this?
check out flatten_json
from flatten_json import flatten
dic = [
{
"id": 1,
"name": {
"english": "Bulbasaur",
"french": "Bulbizarre"
},
"type": [
"Grass",
"Poison"
],
"base": {
"HP": 45,
"Attack": 49,
"Defense": 49
}
},
{
"id": 2,
"name": {
"english": "Ivysaur",
"french": "Herbizarre"
},
"type": [
"Grass",
"Poison"
],
"base": {
"HP": 60,
"Attack": 62,
"Defense": 63
}
}
]
dic_flattened = (flatten(d, '.') for d in dic)
df = pd.DataFrame(dic_flattened)
Output:
id name.english name.french type.0 type.1 base.HP base.Attack base.Defense
0 1 Bulbasaur Bulbizarre Grass Poison 45 49 49
1 2 Ivysaur Herbizarre Grass Poison 60 62 63