Search code examples
pythonpandasdataframefor-loopsubstring

Check if one of elements in list is in dataframe column


I have DF that looks like below:

columna_1   column_2
1           I am Thomas
2           Are you Thomas
3           How are you?
4           I am fine...
5           Jack, what's up?

and I have a list like this:

names = ["Thomas", "Jack"]

What I need is to find values from list names if there are in column_2 and if there are, it should add new column with the values from list. So it should be like this:

columna_1   column_2          column_3
1           I am Thomas       Thomas
2           Are you Thomas    Thomas
5           Jack, what's up?  Jack

I have started with finding just values but I am not sure how I could add a new column with correct values. Below is my code:

df[df['column_2'].astype("str").apply(lambda x: set(names).issubset(x))]

Do you have any idea? Thanks for help!


Solution

  • Use a regex:

    import re
    regex = fr'\b({"|".join(map(re.escape, names))})\b'
    df['column_3'] = df['column_2'].str.extract(regex, expand=False)
    

    To also drop the non matches:

    import re
    regex = fr'\b({"|".join(map(re.escape,names))})\b'
    (df.assign(column_3=df['column_2'].str.extract(regex, expand=False))
       .dropna(subset=['column_3'])
    )
    

    output:

       columna_1          column_2 column_3
    0          1       I am Thomas   Thomas
    1          2    Are you Thomas   Thomas
    4          5  Jack, what's up?     Jack
    

    NB. Note that in case both Thomas and Jack are in the sentence, only the first one in the sentence will match