Hi I am attempting to reshape this json data within a dataframe using pandas.
id categories
1 3ee877e0 [{"entity_def_id":"category","permalink":"blockchain","uuid":"1fea6201","value":"Blockchain"},{"entity_def_id":"category","permalink":"cryptocurrency","uuid":"bd082f4d","value":"Cryptocurrency"},{"entity_def_id":"category","permalink":"loyalty-programs","uuid":"4a45af54","value":"Loyalty Programs"},{"entity_def_id":"category","permalink":"marketplace-772d","uuid":"772da8fe","value":"Marketplace"},{"entity_def_id":"category","permalink":"software","uuid":"c08b5441","value":"Software"}]
Expected outcome
id entity_def_id permalink uuid value
3ee877e0 category blockchain 1fea6201 Blockchain
3ee877e0 category cryptocurrency bd082f4d Cryptocurrency
3ee877e0 category loyalty-programs 4a45af54 Loyalty Programs
3ee877e0 category marketplace-772d 772da8fe Marketplace
3ee877e0 category software c08b5441 Software
Sorry for not posting my attempts at doing so, but I am new to python, and already know how to do it in mongodb and dataiku, just want to know of a way to do so using python.
You can try explode
categories
columns and then convert dictionaries in categories
column to multiple columns
out = (df.assign(categories=df['categories'].apply(eval))
.explode('categories', ignore_index=True)
.pipe(lambda df: df.join(pd.DataFrame(df.pop('categories').values.tolist()))))
print(out)
id entity_def_id permalink uuid value
0 3ee877e0 category blockchain 1fea6201 Blockchain
1 3ee877e0 category cryptocurrency bd082f4d Cryptocurrency
2 3ee877e0 category loyalty-programs 4a45af54 Loyalty Programs
3 3ee877e0 category marketplace-772d 772da8fe Marketplace
4 3ee877e0 category software c08b5441 Software