Search code examples
pythonpandasstringnumpyfind

DataFrame string exact match


I am using the str.contains method to look for a specific row in a data frame. I however want to find an exact string match, and even if I add regex = False, it seems to pick up the partial matches.

str.find doesn't work either, is there another function I should be using for this? Here is a snippet where I need it in:

Here is some code for replication

data = {'A':['tree','bush','forest','tree/red']}

df_test=pd.DataFrame(data)

df_test['New'] = np.where(df_test['A'].str.contains('tree', regex = False) |
                                   df_test['A'].str.contains('bush') |
                                   df_test['A'].str.contains('forest') 
    
                                   , 'Good', '')

So I would like to code above to only find rows with 'tree','bush' or 'forest', however it also picks up rows which say 'tree/red'.

enter image description here


Solution

  • str.contains checks if the substring exists in the string so either you use a complicated regex to get the required output or you can simple find another way.

    here is how i would do it

    import pandas as pd
    data = {'A':['tree','bush','forest','tree/red']}
    good = ['tree','bush','forest',]
    df_test=pd.DataFrame(data)
    df_test['New'] = df_test.apply(lambda row: 'Good' if row['A'] in good else 'Bad', axis=1)