I have a dataframe and would like to check if a column value contains another column value.
name1 name2
0 aa aab
1 xyz x
the below doesn't work
df = df.assign(name1_contains_name2=df.name1.str.contains(df.name2),
name2_contains_name1=df.name2.str.contains(df.name1))
but I would like to get the below dataframe
name1 name2 name1_contains_name2 name2_contains_name1
0 aa aab False True
1 xyz x True False
How can I write it?
If need test per rows use list comprehensions:
z = list(zip(df.name1, df.name2))
out = df.assign(name1_contains_name2=[b in a for a, b in z],
name2_contains_name1=[a in b for a, b in z])
print (out)
name1 name2 name1_contains_name2 name2_contains_name1
0 aa aab False True
1 xyz x True False
Or use one list comprehension with DataFrame constructor:
out = df.join(pd.DataFrame([[(b in a), (a in b)] for a, b in zip(df.name1, df.name2)],
columns=['name1_contains_name','name2_contains_name1'],
index=df.index))
print (out)
name1 name2 name1_contains_name name2_contains_name1
0 aa aab False True
1 xyz x True False
If need test per all values use |
with join for regex OR
:
out = df.assign(name1_contains_name2=df.name1.str.contains('|'.join(df.name2)),
name2_contains_name1=df.name2.str.contains('|'.join(df.name1)))
print (out)
name1 name2 name1_contains_name2 name2_contains_name1
0 aa aab False True
1 xyz x True False
Difference is possible seen in added new line to DataFrame:
print (df)
name1 name2
0 aa aab
1 xyz x
2 b xyza
out = df.assign(name1_contains_name2=df.name1.str.contains('|'.join(df.name2)),
name2_contains_name1=df.name2.str.contains('|'.join(df.name1)))
print (out)
name1 name2 name1_contains_name2 name2_contains_name1
0 aa aab False True
1 xyz x True False
2 b xyza False True
z = list(zip(df.name1, df.name2))
out1 = df.assign(name1_contains_name2=[b in a for a, b in z],
name2_contains_name1=[a in b for a, b in z])
print (out1)
name1 name2 name1_contains_name2 name2_contains_name1
0 aa aab False True
1 xyz x True False
2 b xyza False False