python pandas dataframe machine-learning data-cleaning

Explode List containing many dictionaries in Pandas dataframe

I am having a dataset which look like follows(in dataframe):

**_id** **paper_title**   **references**                                                                  **full_text**
 1         XYZ              [{'abc':'something','def':'something'},{'def':'something'},...many others]       something
 2         XYZ              [{'abc':'something','def':'something'},{'def':'something'},...many others]       something
 3         XYZ              [{'abc':'something'},{'def':'something'},...many others]                         something

Expected:

**_id** **paper_title**   **abc**    **def**                               **full_text**
   1         XYZ          something  something                               something               
                          something  something
                          .    
                          .
                         (all the dic in list with respect to_id column)
   2         XYZ          something  something                               something               
                          something  something
                          .    
                          .
                         (all the dic in list with respect to_id column)

I have tried df['column_name'].apply(pd.Series).apply(pd.Series) to split the list and dictionaries into columns of dataframe but doesn't help as it didn't split dictionaries.

First row of my dataframe: df.head(1)

Solution

After a lot of Documentation reading of pandas, I found the explode method applying with apply(pd.Series) is the easiest of what I was looking for in the question.

Here is the Code:

df = df.explode('reference')

# It explodes the lists to rows of the subset columns

df = df['reference'].apply(pd.Series).merge(df, left_index=True, right_index=True, how ='outer')

# split a list inside a Dataframe cell into rows and merge with original dataframe like (AUB) in set theory

Sidenote: while merging look for unique values in column as there will many columns with duplicated values

I hope this helps someone with dataframe/Series with columns having list containing multiple dictionaries and want to split list of multiple dictionaries key to new column with values as their rows.