Search code examples
pythonpandasassign

Assigning pandas columns based on set of reference list


The objective is to assign the column main_frame value to a list of reference.

Currently, the operation is achieved as below:

import pandas as pd

watchlist_ref = [['A1','AA2','A3'],
                ['B1','BC2','B3']]
upper_ref = ['A','B']
df = pd.DataFrame ({'tw':['A1','AA2','A3','B1','BC2','B3']})

for ls_str, ws in zip(watchlist_ref, upper_ref):
    df.loc[(df['tw'].str.contains('|'.join(ls_str), case=False)), 'main_frame'] = ws

Which give the output below:

    tw main_frame
0   A1          A
1  AA2          A
2   A3          A
3   B1          B
4  BC2          B
5   B3          B

But, is there any way to avoid the use of for-loop?


Solution

  • You can create a dictionary containing the watchlist values as keys and the reference as value and then use replace with regex=True to create the new column:

    d = {'|'.join(ls_str): ws for ls_str, ws in zip(watchlist_ref, upper_ref)}
    df['main_frame'] = df['tw'].replace(d, regex=True)
    

    Result:

         tw  main_frame
    0    A1           A
    1   AA2           A
    2    A3           A
    3    B1           B
    4   BC2           B
    5    B3           B