Search code examples
pythondataframesubstringcontainsintersection

Check if a substring is in a string python


I have two dataframe, I need to check contain substring from first df in each string in second df and get a list of words that are included in the second df

First df(word):

word
apples
dog
cat
cheese

Second df(sentence):

sentence
apples grow on a tree
...
I love cheese

I tried this one:

tru=[]
for i in word['word']:
    if i in sentence['sentence'].values:    
        tru.append(i)

And this one:

tru=[]
for i in word['word']:
    if sentence['sentence'].str.contains(i):    
        tru.append(i)

I expect to get a list like ['apples',..., 'cheese']


Solution

  • One possible way is to use Series.str.extractall:

    import re
    import pandas as pd
    
    df_word = pd.Series(["apples", "dog", "cat", "cheese"])
    df_sentence = pd.Series(["apples grow on a tree", "i love cheese"])
    
    pattern = f"({'|'.join(df_word.apply(re.escape))})"
    matches = df_sentence.str.extractall(pattern)
    matches
    

    Output:

    
                    0
        match   
    0     0    apples
    1     0    cheese
    

    You can then convert the results to a list:

    matches[0].unique().tolist()
    

    Output:

    ['apples', 'cheese']