Search code examples
pythonstringpandasnlpcontains

NLP: How do I search for a string that has brackets?


I'm trying to filter a dataframe if col1 contains any substring from a list. The problem is that one substring contains brackets that cause an error (in bold in the illustration). Any solution? Thanks!

index   fruit_name
0       "apple"
1       "pear"
2       "passionfruit (Passiflora)"
4       "grape"

substring_list = ['apple',**'(passiflora)'**]
df[df.fruit_name.str.contains('|'.join(substring_list))]

Solution

  • Brackets like () are special characters in regex, so you need to use a backslash \ before them like:

    df = pd.DataFrame({'fruit_name': ["apple","pear","passionfruit (Passiflora)", "grape"]})
    
    substring_list = ['apple','\(passiflora\)']
    print (df[df.fruit_name.str.contains('|'.join(substring_list), case=False)]) 
                      fruit_name
    0                      apple
    2  passionfruit (Passiflora)