I have a dataframe where an column cells contain lists of multiple values. I'd like to iterate each item in the list over all rows and check if the item exist in the column 'id', if the case add the value of other column "new_val" to the existing list.
Here's an example:
id new_val list_col
id1 n1 [id5,id3]
id2 nan [id6,id8]
id3 n7 [id1,id2]
id4 n4 []
taking as example the first row, where the list equal [id5,id3]. if we look over rows we see id3 exist in id so we get the value of column "new_val" and add it to the list [id5,id3,n7].
The expected result:
id new_val list_col update list_col
id1 n1 [id5,id3] [id5,id3,n7]
id2 nan [id6,id8] [id6,id8]
id3 n7 [id1,id2] [id1,id2,n1]
id4 n4 [] []
This solution worrk for me :
import pandas as pd
# create the initial dataset
data = {'id': ['id1', 'id2', 'id3', 'id4'],
'new_val': ['n1', '', 'n7', 'n4'],
'list_col': [['id5', 'id3'], ['id6', 'id8'], ['id1', 'id2'], []]}
df = pd.DataFrame(data)
# create a copy of the 'list_col' column to store the updated values
df['updated_list_col'] = df['list_col'].copy()
# iterate over each row
for i, row in df.iterrows():
# iterate over each item in the list_col
for item in row['list_col']:
# check if the item exists in the 'id' column
if item in df['id'].values:
# get the 'new_val' associated with the matching 'id'
new_val = df.loc[df['id'] == item, 'new_val'].values[0]
# add the 'new_val' to the list_col for the current row, if it's not an empty string
if new_val != '' and new_val not in row['updated_list_col']:
df.at[i, 'updated_list_col'].append(new_val)