In a dataframe
df = pd.DataFrame({'colA': ['id1', 'id2', 'id3', 'id4', 'id5'],
'colB': ['Black cat', 'Black mouse', 'Black_A cat', 'Black cat', 'White_A mouse']})
I want to find all the lines where colB
contains Black cat
. My command
df[df['colB'].str.contains('Black cat', na=False)]
allows to find only
colA colB
0 id1 Black cat
3 id4 Black cat
while I expect this:
colA colB
0 id1 Black cat
2 id3 Black_A cat
3 id4 Black cat
What is wrong with partial matches?
What's partial match
in your case? contains
is to find exact substrings, so Black A cat
wouldn't match Black cat
. If you expect optional characters in between Black
and cat
you should specify that in the pattern:
df[df['colB'].str.contains('Black.*cat', na=False)]
# ^ this
Output:
colA colB
0 id1 Black cat
2 id3 Black_A cat
3 id4 Black cat