Search code examples
jsonpandasnormalize

Pandas json_normalize with nested JSON


I have tried many alternatives and the better result was with this statement:

json_normalize(a['solution'][0]['tour'])

I can see just one block of tours at a time. vehicle_id 0 information and I need them all together. Thank you.

The JSON is:

{
    "total_clusters": 6,
    "solution": [
        {
            "vehicles_id": "0",
            "vehicles_location": {
                "lat": "",
                "lng": ""
            },
            "tour": [
                {
                    "shipping_id": "4a4b0750-63a7-11ea-8955-43fcb2cd860a",
                    "type": "dropoff",
                    "location_id": "797",
                    "coordinates": {
                        "lat": "-34.545736",
                        "lng": "-58.488340"
                    },
                    "cluster": 0
                },
                {
                    "shipping_id": "75e5a2c0-6314-11ea-b657-ddd473c629a3",
                    "type": "dropoff",
                    "location_id": "114",
                    "coordinates": {
                        "lat": "-34.568707",
                        "lng": "-58.452963"
                    },
                    "cluster": 0                 
                }
            ]
        },
        {
            "vehicles_id": "1",
            "vehicles_location": {
                "lat": "",
                "lng": ""
            },
            "tour": [
                {
                    "shipping_id": "c83ac7c0-51c4-11ea-9aef-973de7785221",
                    "type": "pickup",
                    "location_id": "687",
                    "coordinates": {
                        "lat": "-34.592824",
                        "lng": "-58.375457"
                    },
                    "cluster": 1
                },
                {
                    "shipping_id": "b5a295c0-51c4-11ea-b36d-651ee769ca89",
                    "type": "pickup",
                    "location_id": "687",
                    "coordinates": {
                        "lat": "-34.592824",
                        "lng": "-58.375457"
                    },
                    "cluster": 1            
                }
            ]
        }
    ]
}

Desired Output enter image description here


Solution

  • You need to pass the record_path parameter to json_normalize.

    From the docs:

    record_path : str or list of str, default None

    Path in each object to list of records. If not passed, data will be assumed to be an array of records.

    import pandas as pd
    import json
    
    raw_json_data = """{contents_of_your_json_here}"""
    json_data = json.loads(raw_json_data)
    
    df = pd.json_normalize(json_data, ["solution", "tour"])
    

    Result: enter image description here