Search code examples
jsonpandasdataframenested-json

Converting nested JSON object to pandas Dataframe


I am dealing with a JSON file which has nested fields (arrays). I am trying to convert the same into a Pandas dataframe.

{
    "_id": "2026",
    "dataDate": 1537920000000,
    "dataYear": 2018,
    "groupId": "1378",
    "HourConsumed": 19781.4,
    "HourGenerated": 0,
    "max": 4658.400000000001,
    "maxGen": 0,
    "maxTime": 1538001000000,
    "avg": -206.05625,
    "max": 0,
    "maxGen": 0,
    "maxTime": null,
    "avgTemp": 0,
    "me_Id": "2004506_3166155129",
    "interval": 15,
    "intervalMetaData": [
        "whC",
        "whG",
        "max",
        "maxGen",
        "hC",
        "hG",
        "maxVar",
        "maxGen",
        "avgTemp",
        "eventTime"
    ],
    "intervalData": [
        [
            175.2,
            0,
            700.8,
            0,
            0,
            0,
            0,
            0,
            0,
            1537920900000
        ],
        [
            192,
            0,
            768,
            0,
            0,
            0,
            0,
            0,
            0,
            1537921800000
        ],
        [
            191.39999999999998,
            0,
            765.5999999999999,
            0,
            0,
            0,
            0,
            0,
            0,
            1537922700000
        ]
    ]
}

I need to create separate columns of what's inside intervalMetaData, and then filling those columns with values from intervalData. Is it possible?


Solution

  • If I understand correctly, you just have to set your columns right by importing your list of lists with pandas:

    import pandas as pd
    
    data = {
        "_id": "2026",
        "dataDate": 1537920000000,
        "dataYear": 2018,
        "groupId": "1378",
        "HourConsumed": 19781.4,
        "HourGenerated": 0,
        "max": 4658.400000000001,
        "maxGen": 0,
        "maxTime": 1538001000000,
        "avg": -206.05625,
        "max": 0,
        "maxGen": 0,
        "maxTime": None,
        "avgTemp": 0,
        "me_Id": "2004506_3166155129",
        "interval": 15,
        "intervalMetaData": [
            "whC",
            "whG",
            "max",
            "maxGen",
            "hC",
            "hG",
            "maxVar",
            "maxGen",
            "avgTemp",
            "eventTime"
        ],
        "intervalData": [
            [
                175.2,
                0,
                700.8,
                0,
                0,
                0,
                0,
                0,
                0,
                1537920900000
            ],
            [
                192,
                0,
                768,
                0,
                0,
                0,
                0,
                0,
                0,
                1537921800000
            ],
            [
                191.39999999999998,
                0,
                765.5999999999999,
                0,
                0,
                0,
                0,
                0,
                0,
                1537922700000
            ]
        ]
    }
    
    
    df = pd.DataFrame(data["intervalData"], columns=data["intervalMetaData"])
    print(df)
    

    Output:

         whC  whG    max  maxGen  hC  hG  maxVar  maxGen  avgTemp      eventTime
    0  175.2    0  700.8       0   0   0       0       0        0  1537920900000
    1  192.0    0  768.0       0   0   0       0       0        0  1537921800000
    2  191.4    0  765.6       0   0   0       0       0        0  1537922700000
    

    Edit: you can add the other keys as columns with a loop:

    for k,v in data.items():
        if k not in ["intervalData", "intervalMetaData"]:
            df[k] = v