I am having an issue with the explode function. I have a 2 column dataframe:
pub_id | category_for |
---|---|
pub.1155807502 | [{'id': '80003', 'name': '32 Biomedical and Clinical Sciences'}, {'id': '80045', 'name': '3202 Clinical Sciences'}] |
pub.1153826092 | [{'id': '80003', 'name': '32 Biomedical and Clinical Sciences'}, {'id': '80232', 'name': '5202 Biological Psychology'}, {'id': '80045', 'name': '3202 Clinical Sciences'}, {'id': '80052', 'name': '3209 Neurosciences'}, {'id': '80023', 'name': '52 Psychology'}] |
pub.1145064359 | [{'id': '80003', 'name': '32 Biomedical and Clinical Sciences'}, {'id': '80052', 'name': '3209 Neurosciences'}, {'id': '80045', 'name': '3202 Clinical Sciences'}] |
pub.1145747691 | [{'id': '80003', 'name': '32 Biomedical and Clinical Sciences'}, {'id': '80052', 'name': '3209 Neurosciences'}, {'id': '80045', 'name': '3202 Clinical Sciences'}] |
pub.1144315107 | [{'id': '80003', 'name': '32 Biomedical and Clinical Sciences'}, {'id': '80232', 'name': '5202 Biological Psychology'}, {'id': '80045', 'name': '3202 Clinical Sciences'}, {'id': '80052', 'name': '3209 Neurosciences'}, {'id': '80023', 'name': '52 Psychology'}] |
And I want to "explode" the "category_for" column to obtain something like this:
pub_id | id | name |
---|---|---|
pub.1155807502 | 80003 | 32 Biomedical and Clinical Sciences |
pub.1155807502 | 80045 | 3202 Clinical Sciences |
pub.1153826092 | 80003 | 32 Biomedical and Clinical Sciences |
pub.1153826092 | 80232 | 5202 Biological Psychology |
pub.1153826092 | 80045 | 3202 Clinical Sciences |
pub.1153826092 | 80052 | 3209 Neurosciences |
pub.1153826092 | 80023 | 52 Psychology |
I tried
df = df.explode('category_for')
df = pd.concat([df, df.pop("category_for").apply(pd.Series)], axis=1)
but nothing happens at the "explode" step.
I also tried:
df.set_index('pub_id')['category_for'].apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'category_for'})
but again without success.
The list of dicts in the category_for
column are probably stored as strings. You can check if that's the case with the following.
type(df.category_for[0])
>>> str
You can convert the type of the items by applying the literal_eval
function.
from ast import literal_eval
# convert the column items from str to list of dicts
df.loc[:, "category_for"] = df.loc[:, "category_for"].apply(lambda x: literal_eval(x))
Finally, you can use explode
, and concatenate with the pub_id
column.
df = df.explode("category_for", ignore_index=True)
df_result = pd.concat([df.pub_id, df.category_for.apply(pd.Series)], axis=1)