Search code examples
pythonpandasstringdataframecontains

check if string contains another column value


I have a dataframe and would like to check if a column value contains another column value.

    name1   name2
0   aa      aab
1   xyz     x

the below doesn't work

df = df.assign(name1_contains_name2=df.name1.str.contains(df.name2),
            name2_contains_name1=df.name2.str.contains(df.name1))

but I would like to get the below dataframe

    name1   name2   name1_contains_name2    name2_contains_name1
0   aa      aab     False                   True
1   xyz     x       True                    False

How can I write it?


Solution

  • If need test per rows use list comprehensions:

    z = list(zip(df.name1, df.name2))
    out = df.assign(name1_contains_name2=[b in a for a, b in z],
                   name2_contains_name1=[a in b for a, b in z])
    print (out)
      name1 name2  name1_contains_name2  name2_contains_name1
    0    aa   aab                 False                  True
    1   xyz     x                  True                 False
    

    Or use one list comprehension with DataFrame constructor:

    out = df.join(pd.DataFrame([[(b in a), (a in b)] for a, b in zip(df.name1, df.name2)], 
                      columns=['name1_contains_name','name2_contains_name1'],
                      index=df.index))
    
    print (out)
      name1 name2  name1_contains_name  name2_contains_name1
    0    aa   aab                False                  True
    1   xyz     x                 True                 False
    

    If need test per all values use | with join for regex OR:

    out = df.assign(name1_contains_name2=df.name1.str.contains('|'.join(df.name2)),
                    name2_contains_name1=df.name2.str.contains('|'.join(df.name1)))
    print (out)
    
      name1 name2  name1_contains_name2  name2_contains_name1
    0    aa   aab                 False                  True
    1   xyz     x                  True                 False
    

    Difference is possible seen in added new line to DataFrame:

    print (df)
      name1 name2
    0    aa   aab
    1   xyz     x
    2     b  xyza
    
    out = df.assign(name1_contains_name2=df.name1.str.contains('|'.join(df.name2)),
                    name2_contains_name1=df.name2.str.contains('|'.join(df.name1)))
    print (out)
      name1 name2  name1_contains_name2  name2_contains_name1
    0    aa   aab                 False                  True
    1   xyz     x                  True                 False
    2     b  xyza                 False                  True
    
    z = list(zip(df.name1, df.name2))
    out1 = df.assign(name1_contains_name2=[b in a for a, b in z],
                   name2_contains_name1=[a in b for a, b in z])
    print (out1)
      name1 name2  name1_contains_name2  name2_contains_name1
    0    aa   aab                 False                  True
    1   xyz     x                  True                 False
    2     b  xyza                 False                 False