Search code examples
pythonpandasstringdataframepartial

Check if comma separated values in a dataframe contains values from another dataframe in python


I have 2 dataframes

df1 has a list of users and assets

    print(df1)  
        User       Asset
    0   user_1   asset_1,asset_2,asset_3 
    1   user_2    asset_4  
    2   user_3    asset_5  

df2 gives a list of expired assets

    print(df2)  
        Asset
    0   asset_2 
    1   asset_5  
    2   asset_6  

I want to flag any expired assets in df1 as below

    print(df1)  
        User       Asset                    Flag
    0   user_1   asset_1,asset_2,asset_3   Expired
    1   user_2    asset_4                     
    2   user_3    asset_5                  Expired

I am using the below piece of code which i know is incorrect and need help with the right syntax which will help me compare comma separated values to values in another df

df1.loc[df1['Asset'].isin(df2['Asset']), 'Flag'] = 'Expired'

thank you for the help


Solution

  • One option is to split and explode the list of assets, merge it to df2 and set Flag when there's a match:

    In [129]: df1['Flag'] = ''
    
    In [130]: df1.loc[df1['Asset'].str.split(',').explode().reset_index().merge(df2)['index'].unique(), 'Flag'] = 'Expired'
    
    In [131]: df1
    Out[131]:
         User                    Asset     Flag
    0  user_1  asset_1,asset_2,asset_3  Expired
    1  user_2                  asset_4
    2  user_3                  asset_5  Expired