Search code examples
pythonpandaspython-3.7unnest

Explode multiple columns in Pandas


I have researched this problem and found out that Pandas' explode function does not work on multiple columns, however, I have seen a few questions submitted on StackOverflow however, none of them seem to work for me.

Dataset:

j = { 
    "_id" : "5c45", 
    "user" : 5, 
    "ids" : [
        "1019", 
        "1021", 
        "1162"
    ], 
    "roles" : ["2d7f"]
}

Current Script:

root = json_normalize(j)
x = (root.applymap(type) == list).all()
y = x.index[x].tolist()
root = root.apply(lambda x: [str(v).split(',') for v in x]).apply(pd.Series.explode)

print(root)

I tried this solution here, but I get a value error:

ValueError: cannot reindex from a duplicate axis

Expected Result:

_id,user,ids,roles
5c45,5,1019,2d7f
5c45,5,1021,2d7f
5c45,5,1162,2d7f

Is there a simple, yet effective workaround to this?


Solution

  • Try record_path and meta options:

    pd.json_normalize(j, record_path=['ids'], meta=['_id','user','roles'])
    

    Output:

          0   _id user roles
    0  1019  5c45    5  2d7f
    1  1021  5c45    5  2d7f
    2  1162  5c45    5  2d7f
    

    For a somewhat dynamic solution, try flatten the singletons:

    pd.DataFrame({k:v[0] if isinstance(v, list) and len(v) == 1 else v
                  for k,v in j.items()
                 })