python json python-3.x pandas json-normalize

How to import multiple json sections (all in one file) in to python/pandas

I'm trying to process a json file for use in another program that will use an excel file. My json file has multiple sections/arrays, one for some basic info like number of records, report name. It has sections/arrays for column names and then another that has each individual row/record as an array.

I've tried using pandas.read_json and the json_loads options to read the data in but I continually get errors. If I remove all sections except just one (like the rows) section, I can get it to read that in (although it puts all columns in one column like it's not identifying each column name separated by a comma as a separate column.

Ideally I'd like to not need to manually edit this file and just process it in python into a pandas dataframe so I can further manipulate it and export it for other use.

Any help would be greatly appreciated. This is my first post so if there's anything I can do better let me know!

Here's a representation of the json data, the actual data has several more columns and many more rows/records (700+ usually)

{
"count": 2,
"name": "report1",
"columnNames": [
    "Incident ID",
    "Status",
    "Subject"
],
"rows": [
    [
        "2460636",
        "Resolved",
        "login help"
    ],
    [
        "2460637",
        "Unresolved",
        "email help"
    ]
    ]
}

I'm trying to get the columnNames section to become the column names in a pandas dataframe and each "row" to be a record in the dataframe.

I've tried looking at other examples but I haven't come across a similar issue with json formatted like this.

I've tried using pandas.read_json("example.json") as well as the json.loads to load the data to get it in but they all come up with different errors that I can't seem to get around.

When Running the pandas.read_json("example.json") it comes back saying "arrays must all be same length".

Result should be the columnNames section/array should be the column names of the pandas dataframe and then each "row" I want to be a record in the dataframe.

Solution

Use `pd.json_normalize`: to unpack your `json` file

pd.json_normalize

import pandas as pd
import json

with open('test.json') as f:
    data = json.load(f)

json_data = pd.json_normalize(data)

Output:

                       columnNames  count  name                            rows
0   [Incident ID, Status, Subject]  2   report1 [[2460636, Resolved, login help], [2460637, Un...

Unpack rows:

df_rows = pd.json_normalize(data, record_path=['rows'], meta=['name'])
df_rows.rename({0: data['columnNames'][0],
                1: data['columnNames'][1],
                2: data['columnNames'][2]}, axis=1, inplace=True)

Output of df_row:

     Incident ID        Status       Subject       name
0        2460636      Resolved    login help    report1
1        2460637    Unresolved    email help    report1

The json isn't particularly well formed, something like the following would be easier to unpack:

{
    "count": 2,
    "name": "report1",
    "rows": [{
            "Incident ID": "2460636",
            "Status": "Resolved",
            "Subject": "login help"
        }, {
            "Incident ID": "2460637",
            "Status": "Unresolved",
            "Subject": "email help"
        }
    ]
}

How to import multiple json sections (all in one file) in to python/pandas

Use pd.json_normalize: to unpack your json file

Use `pd.json_normalize`: to unpack your `json` file