Search code examples
pythonpandaslistdataframeiteration

Pandas dataframe : Iterate over rows an item in column that contains list


I have a dataframe where an column cells contain lists of multiple values. I'd like to iterate each item in the list over all rows and check if the item exist in the column 'id', if the case add the value of other column "new_val" to the existing list.

Here's an example:

id  new_val list_col
id1 n1  [id5,id3]
id2 nan [id6,id8]
id3 n7  [id1,id2]
id4 n4  []

taking as example the first row, where the list equal [id5,id3]. if we look over rows we see id3 exist in id so we get the value of column "new_val" and add it to the list [id5,id3,n7].

The expected result:

id  new_val list_col    update list_col
id1 n1  [id5,id3]   [id5,id3,n7]
id2 nan [id6,id8]   [id6,id8]
id3 n7  [id1,id2]   [id1,id2,n1]
id4 n4  []  []

Solution

  • This solution worrk for me :

    import pandas as pd
    
    # create the initial dataset
    data = {'id': ['id1', 'id2', 'id3', 'id4'],
            'new_val': ['n1', '', 'n7', 'n4'],
            'list_col': [['id5', 'id3'], ['id6', 'id8'], ['id1', 'id2'], []]}
    df = pd.DataFrame(data)
    
    # create a copy of the 'list_col' column to store the updated values
    df['updated_list_col'] = df['list_col'].copy()
    
    # iterate over each row
    for i, row in df.iterrows():
        # iterate over each item in the list_col
        for item in row['list_col']:
            # check if the item exists in the 'id' column
            if item in df['id'].values:
                # get the 'new_val' associated with the matching 'id'
                new_val = df.loc[df['id'] == item, 'new_val'].values[0]
                # add the 'new_val' to the list_col for the current row, if it's not an empty string
                if new_val != '' and new_val not in row['updated_list_col']:
                    df.at[i, 'updated_list_col'].append(new_val)