Search code examples
pythonseriesexplode

Explode function not working on simple python dataframe


I am having an issue with the explode function. I have a 2 column dataframe:

pub_id category_for
pub.1155807502 [{'id': '80003', 'name': '32 Biomedical and Clinical Sciences'}, {'id': '80045', 'name': '3202 Clinical Sciences'}]
pub.1153826092 [{'id': '80003', 'name': '32 Biomedical and Clinical Sciences'}, {'id': '80232', 'name': '5202 Biological Psychology'}, {'id': '80045', 'name': '3202 Clinical Sciences'}, {'id': '80052', 'name': '3209 Neurosciences'}, {'id': '80023', 'name': '52 Psychology'}]
pub.1145064359 [{'id': '80003', 'name': '32 Biomedical and Clinical Sciences'}, {'id': '80052', 'name': '3209 Neurosciences'}, {'id': '80045', 'name': '3202 Clinical Sciences'}]
pub.1145747691 [{'id': '80003', 'name': '32 Biomedical and Clinical Sciences'}, {'id': '80052', 'name': '3209 Neurosciences'}, {'id': '80045', 'name': '3202 Clinical Sciences'}]
pub.1144315107 [{'id': '80003', 'name': '32 Biomedical and Clinical Sciences'}, {'id': '80232', 'name': '5202 Biological Psychology'}, {'id': '80045', 'name': '3202 Clinical Sciences'}, {'id': '80052', 'name': '3209 Neurosciences'}, {'id': '80023', 'name': '52 Psychology'}]

And I want to "explode" the "category_for" column to obtain something like this:

pub_id id name
pub.1155807502 80003 32 Biomedical and Clinical Sciences
pub.1155807502 80045 3202 Clinical Sciences
pub.1153826092 80003 32 Biomedical and Clinical Sciences
pub.1153826092 80232 5202 Biological Psychology
pub.1153826092 80045 3202 Clinical Sciences
pub.1153826092 80052 3209 Neurosciences
pub.1153826092 80023 52 Psychology

I tried

df = df.explode('category_for') 
df = pd.concat([df, df.pop("category_for").apply(pd.Series)], axis=1)

but nothing happens at the "explode" step.

I also tried:

df.set_index('pub_id')['category_for'].apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'category_for'})

but again without success.


Solution

  • The list of dicts in the category_for column are probably stored as strings. You can check if that's the case with the following.

    type(df.category_for[0])
    >>> str
    

    You can convert the type of the items by applying the literal_eval function.

    from ast import literal_eval
    
    # convert the column items from str to list of dicts
    df.loc[:, "category_for"] = df.loc[:, "category_for"].apply(lambda x: literal_eval(x))
    

    Finally, you can use explode, and concatenate with the pub_id column.

    df = df.explode("category_for", ignore_index=True)
    
    df_result = pd.concat([df.pub_id, df.category_for.apply(pd.Series)], axis=1)