Search code examples
pythonjsonpandasdataframe

Can I turn a Pandas Dataframe column containing a jJSON string into additional columns and rows?


I've run into a strange issue with an API I am calling. I'm getting my data as a JSON with a string containing a JSON inside of it. Instead of being nested, for some reason it comes across as a string field inside the JSON. I've gotten to a point where I can change this original JSON into a Pandas Dataframe that can be seen below, but I need to eventually drop this into a table in a SQL database, but the JSON I am working with is significantly longer than the sample I am working with here, and as a result, the JSON is being cut off due to the maximum size of a varchar I can work with in SQL.

Basically, is there any way for me to work with this dataframe in Python to translate the JSON into additional columns and rows on the dataframe?

This is the Dataframe I am currently working with:

Department Value json
1 A [{"employeeID":123,"name":"Jenny"}, {"employeeID":456,"name":"Mike"}, {"employeeID":789,"name":"Ricky"}]
2 B [{"employeeID":735,"name":"Todd", "badgeNo":84639}, {"employeeID":223,"name":"Greg", "badgeNo":93649}]
3 C []
4 D [{"employeeID":947,"name":"Cardi"}, {"employeeID":284,"name":"Tom"}]

I am trying to get my Dataframe to look like this

Department Value employeeID name badgeNo
1 A 123 Jenny
1 A 456 Mike
1 A 789 Ricky
2 B 735 Todd 84639
2 B 223 Greg 93649
3 C
4 D 947 Cardi
4 D 284 Tom

Any and all help is appreciated


Solution

  • If the values in column 'json' are strings, first convert them to JSON with json.loads.

    Then you can explode the values into new columns for each JSON object, and use json_normalize to convert them into dataframes.

    Finally, concat (on axis=1) the original dataframe minus column 'json' + the newly created dataframe.

    df["json"] = df["json"].apply(json.loads)
    df = df.explode("json").reset_index(drop=True)
    out = pd.concat([df.drop(columns="json"), pd.json_normalize(df["json"])], axis=1)
    
       Department Value  employeeID   name  badgeNo
    0           1     A       123.0  Jenny      NaN
    1           1     A       456.0   Mike      NaN
    2           1     A       789.0  Ricky      NaN
    3           2     B       735.0   Todd  84639.0
    4           2     B       223.0   Greg  93649.0
    5           3     C         NaN    NaN      NaN
    6           4     D       947.0  Cardi      NaN
    7           4     D       284.0    Tom      NaN