Search code examples
pythonjsonpandasdictionaryjson-normalize

Nested JSON Array to Python Pandas DataFrame


I'm trying to expand nested json array in pandas dataframe.

That's the JSON I have:

[ {
        "id": "0001",
        "name": "Stiven",
        "location": [{
                "country": "Colombia",
                "department": "Chocó",
                "city": "Quibdó"
            }, {
                "country": "Colombia",
                "department": "Antioquia",
                "city": "Medellin"
            }, {
                "country": "Colombia",
                "department": "Cundinamarca",
                "city": "Bogotá"
            }
        ]
    }, {
        "id": "0002",
        "name": "Jhon Jaime",
        "location": [{
                "country": "Colombia",
                "department": "Valle del Cauca",
                "city": "Cali"
            }, {
                "country": "Colombia",
                "department": "Putumayo",
                "city": "Mocoa"
            }, {
                "country": "Colombia",
                "department": "Arauca",
                "city": "Arauca"
            }
        ]
    }, {
        "id": "0003",
        "name": "Francisco",
        "location": [{
                "country": "Colombia",
                "department": "Atlántico",
                "city": "Barranquilla"
            }, {
                "country": "Colombia",
                "department": "Bolívar",
                "city": "Cartagena"
            }, {
                "country": "Colombia",
                "department": "La Guajira",
                "city": "Riohacha"
            }
        ]
    }
]

That's the dataframe I have:

index   id    name         location
0       0001  Stiven       [{'country':'Colombia', 'department': 'Chocó', 'city': 'Quibdó'}, {'country':'Colombia', 'department': 'Antioquia', 'city': 'Medellin'}, {'country':'Colombia', 'department': 'Cundinamarca', 'city': 'Bogotá'}]
1       0002  Jhon Jaime   [{'country':'Colombia', 'department': 'Valle del Cauca', 'city': 'Cali'}, {'country':'Colombia', 'department': 'Putumayo', 'city': 'Mocoa'}, {'country':'Colombia', 'department': 'Arauca', 'city': 'Arauca'}]
2       0003  Francisco    [{'country':'Colombia', 'department': 'Atlántico', 'city': 'Barranquilla'}, {'country':'Colombia', 'department': 'Bolívar', 'city': 'Cartagena'}, {'country':'Colombia', 'department': 'La Guajira', 'city': 'Riohacha'}] 

I need to convert to dataframe per id something like this:

index   id    name         country   department       city
0       0001  Stiven       Colombia  Chocó            Quibdó
1       0001  Stiven       Colombia  Antioquia        Medellin
2       0001  Stiven       Colombia  Cundinamarca     Bogotá
3       0002  Jhon Jaime   Colombia  Valle del Cauca  Cali
4       0002  Jhon Jaime   Colombia  Putumayo         Mocoa
5       0002  Jhon Jaime   Colombia  Arauca           Arauca
6       0003  Francisco    Colombia  Atlántico        Barranquilla
7       0003  Francisco    Colombia  Bolívar          Cartagena 
8       0003  Francisco    Colombia  La Guajira       Riohacha   

Solution

    • If the JSON is being loaded from a file, use json.loads, but if the JSON is directly from an API, it may not be necessary.
    • Use pandas.json_normalize with the meta parameter, to convert the JSON into a DataFrame.
    import pandas as pd
    from pathlib import Path
    import json
    
    # path to file
    p = Path(r'c:\path_to_file\test.json')
    
    # read json
    with p.open('r', encoding='utf-8') as f:
        data = json.loads(f.read())
    
    # create dataframe
    df = pd.json_normalize(data, record_path='location', meta=['id', 'name'])
    
    # output
      country       department          city    id        name
     Colombia            Chocó        Quibdó  0001      Stiven
     Colombia        Antioquia      Medellin  0001      Stiven
     Colombia     Cundinamarca        Bogotá  0001      Stiven
     Colombia  Valle del Cauca          Cali  0002  Jhon Jaime
     Colombia         Putumayo         Mocoa  0002  Jhon Jaime
     Colombia           Arauca        Arauca  0002  Jhon Jaime
     Colombia        Atlántico  Barranquilla  0003   Francisco
     Colombia          Bolívar     Cartagena  0003   Francisco
     Colombia       La Guajira      Riohacha  0003   Francisco