Search code examples
pythonpandasdataframedataiku

reshape nested json data in a dataframe using python to get desired output


Hi I am attempting to reshape this json data within a dataframe using pandas.

      id        categories
1     3ee877e0  [{"entity_def_id":"category","permalink":"blockchain","uuid":"1fea6201","value":"Blockchain"},{"entity_def_id":"category","permalink":"cryptocurrency","uuid":"bd082f4d","value":"Cryptocurrency"},{"entity_def_id":"category","permalink":"loyalty-programs","uuid":"4a45af54","value":"Loyalty Programs"},{"entity_def_id":"category","permalink":"marketplace-772d","uuid":"772da8fe","value":"Marketplace"},{"entity_def_id":"category","permalink":"software","uuid":"c08b5441","value":"Software"}]

Expected outcome

id        entity_def_id  permalink         uuid        value
3ee877e0  category       blockchain        1fea6201    Blockchain
3ee877e0  category       cryptocurrency    bd082f4d    Cryptocurrency
3ee877e0  category       loyalty-programs  4a45af54    Loyalty Programs
3ee877e0  category       marketplace-772d  772da8fe    Marketplace
3ee877e0  category       software          c08b5441    Software

Sorry for not posting my attempts at doing so, but I am new to python, and already know how to do it in mongodb and dataiku, just want to know of a way to do so using python.


Solution

  • You can try explode categories columns and then convert dictionaries in categories column to multiple columns

    out = (df.assign(categories=df['categories'].apply(eval))
           .explode('categories', ignore_index=True)
           .pipe(lambda df: df.join(pd.DataFrame(df.pop('categories').values.tolist()))))
    
    print(out)
    
             id entity_def_id         permalink      uuid             value
    0  3ee877e0      category        blockchain  1fea6201        Blockchain
    1  3ee877e0      category    cryptocurrency  bd082f4d    Cryptocurrency
    2  3ee877e0      category  loyalty-programs  4a45af54  Loyalty Programs
    3  3ee877e0      category  marketplace-772d  772da8fe       Marketplace
    4  3ee877e0      category          software  c08b5441          Software