python json pandas dataframe json-normalize

How do I use json_normalize on a json containing a list of nested lists

I'm dealing with a complex JSON string in Python, and I'm having trouble getting the data into a Pandas Data Frame. Sample data, current output, and expected output are all below. I'm attempting to use json_normalize, but it isn't fully normalizing, leaving me with several columns containing JSON lists. My end goal is to insert this into a SQL DB, however the json strings are being truncated due to their length, so I can not use my initial plan to use OPENJSON() to perform the conversion in SQL.

My json_normalize looks like this below. Using max_level=10 was working for all the other API calls I was making for this resource except for this specific table. I've tried listing it out by column as well, but my attempts at this have been failing.

incomingJson = json.loads(incomingString)
incomingDataFrame = pd.json_normalize(incomingJson, 'fields', max_level=10)

incomingString looks like the json below. This is dummy data, just to clarify.


{
    "fields": [
        {
            "id": 12345,
            "description": "Location",
            "choices": [
                {
                    "id": 1,
                    "name": "Seattle"
                },
                {
                    "id": 2,
                    "name": "Nashville"
                }
            ],
            "other_choices": []
        },
        {
            "id": 67372,
            "description": "Name",
            "choices": [
                {
                    "id": 101,
                    "name": "John"
                },
                {
                    "id": 102,
                    "name": "Mary"
                }
            ],
            "other_choices": []
        },
        {
            "id": 39265,
            "description": "Department",
            "choices": [
                {
                    "id": 143,
                    "name": "IT"
                },
                {
                    "id": 226,
                    "name": "HR"
                },
                {
                    "id": 394,
                    "name": "ERG"
                }
            ],
            "other_choices": [
                {
                    "id": 9473
                }
            ]
        }
    ]
}

When I run my code, I get an output like the one below.

id	description	choices	other_choices
12345	Location	[{"id": 1, "name": "Seattle"}, {"id": 2, "name": "Nashville"}]	[]
67372	Name	[{"id": 101, "name": "John"}, {"id": 102, "name": "Mary"}]	[]
39265	Department	{"id": 143, "name": "IT"}, {"id": 226, "name": "HR"}, {"id": 394, "name": "ERG"}	[{"id": 9473}]

I am expecting an output more similar to the table below

id	description	choices.id	choices.name	other_choices.id
12345	Location	1	Seattle
12345	Location	2	Nashville
67372	Name	101	John
67372	Name	102	Mary
39265	Department	143	IT	9473
39265	Department	226	HR	9473
39265	Department	394	ERG	9473

Any and all help is appreciated.

Solution

The meta and record_path arguments proved useful.

df_c = pd.json_normalize(
    json["fields"],
    record_path=["choices"],
    meta=["id", "description"],
    record_prefix="choices.",
)

df_oc = pd.json_normalize(
    json["fields"],
    record_path=["other_choices"],
    meta=["id"],
    record_prefix="other_choices.",
)

col_order = [
    "id",
    "description",
    "choices.id",
    "choices.name",
    "other_choices.id",
]
df = pd.merge(df_c, df_oc, how="left", on="id")[col_order]

record_path points to the entries which are lists and meta points to entries that are single values. I could not find a way to point to multiple record_path, so I split the dfs and joined them.