Search code examples
pythonpandasdataframetextcontains

Python Pandas - Cannot recognize a string from a column in another dataframe column


I've a dataframe with the following data:

enter image description here

Now I am trying to use the isIn method in order to produce a new column with the result if the col_a is in col_b.So in this case I am trying to produce the following output:

enter image description here

For this I am using this code:

df['res'] = df.col_a.isin(df.col_b)

But it's always return FALSE. I also try this: df['res'] = df.col_b.isin(df.col_a) but with the same result... all the rows as FALSE.

What I am doing wrong?

Thanks!


Solution

  • You can check if value in col_a is in col_b per rows by apply:

    df['res'] = df.apply(lambda x: x.col_a in x.col_b, axis=1)
    

    Or by list comprehension:

    df['res'] = [a in b for a, b in zip(df.col_a, df.col_b)]
    

    EDIT: Error obviously mean there are missing values, so if-else statement is necessary:

    df = pd.DataFrame({'col_a':['SQL','Java','C#', np.nan, 'Python', np.nan],
                       'col_b':['I.like_SQL_since_i_used_to_ETL',
                                'I like_programming_SQL.too',
                                'I prefer Java',
                                'I like beer',
                                 np.nan,
                                 np.nan]})
    print (df)
    
    df['res'] = df.apply(lambda x: x.col_a in x.col_b 
                                   if (x.col_a == x.col_a) and (x.col_b == x.col_b) 
                                   else False, axis=1)
    
    df['res1'] = [a in b if (a == a) and (b == b) else False for a, b in zip(df.col_a, df.col_b)]
    print (df)
        col_a                           col_b    res   res1
    0     SQL  I.like_SQL_since_i_used_to_ETL   True   True
    1    Java      I like_programming_SQL.too  False  False
    2      C#                   I prefer Java  False  False
    3     NaN                     I like beer  False  False
    4  Python                             NaN  False  False
    5     NaN                             NaN  False  False