Search code examples
pandasstringlistnew-operatormatching

How to match multiple words from list with pandas data frame column


I have a list like :

keyword_list = ['motorcycle love hobby ', 'bike love me', 'cycle', 'dirtbike cycle motorbike ']

I want to find these words in the panda's data frame column and if 3 words match then it should create a new column with these words.

I need something like this :

enter image description here


Solution

  • You can probably use set operations:

    kw = {s: set(s.split()) for s in keyword_list}
    
    def subset(s):
        S1 = set(s.split())
        for k, S2 in kw.items():
            if S2.issubset(S1):
                return k
    
    df['trigram'] = [subset(s) for s in df['description'].str.lower()]
    
    print(df)
    

    Output:

                                       description                 trigram
    0  I love motorcycle though I have other hobby   motorcycle love hobby 
    1                                  I have bike                    None