Search code examples
pythonlistdataframenlp

Identify strings having words from two different lists


I have a dataframe with three columns like this:

index   string                                         Result
1       The quick brown fox jumps over the lazy dog 
2       fast and furious was a good movie   

and i have two lists of words like this:

list1   ["over", "dog", "movie"]
list2   ["quick", "brown", "sun", "book"]

I want to identify strings that have at least one word from list1 AND at least one word from list2, such that the result will be as follows:

index   string                                      Result
1   The quick brown fox jumps over the lazy dog     TRUE
2   fast and furious was a good movie               FALSE

Explanation: The first sentence has words from both lists and so the result is TRUE. The second sentence has only one word from list1 and so it has a result of False.

Can we do that with python? I used search techniques from NLTK but i don't know how to combine results from the two lists. Thanks


Solution

  • Another option is to split the strings and use set.intersection with all in a list comprehension:

    s_lists = [set(list1), set(list2)]
    df['Result'] = [all(s_lst.intersection(s.split()) for s_lst in s_lists) for s in df['string'].tolist()]
    

    Output:

       index                                       string  Result
    0      1  The quick brown fox jumps over the lazy dog    True
    1      2            fast and furious was a good movie   False