I'm trying to filter a dataframe if col1 contains any substring from a list. The problem is that one substring contains brackets that cause an error (in bold in the illustration). Any solution? Thanks!
index fruit_name
0 "apple"
1 "pear"
2 "passionfruit (Passiflora)"
4 "grape"
substring_list = ['apple',**'(passiflora)'**]
df[df.fruit_name.str.contains('|'.join(substring_list))]
Brackets like ()
are special characters in regex, so you need to use a backslash \
before them like:
df = pd.DataFrame({'fruit_name': ["apple","pear","passionfruit (Passiflora)", "grape"]})
substring_list = ['apple','\(passiflora\)']
print (df[df.fruit_name.str.contains('|'.join(substring_list), case=False)])
fruit_name
0 apple
2 passionfruit (Passiflora)