python pandas dataframe if-statement any

Checking if any word in a string appears in a list using python

I have a pandas dataframe that contains a column of several thousands of comments. I would like to iterate through every row in the column, check to see if the comment contains any word found in a list of words I've created, and if the comment contains a word from my list I want to label it as such in a separate column. This is what I have so far in my code:

retirement_words_list = ['match','matching','401k','retirement','retire','rsu','rrsp']

def word_checker(row):
    for sentence in df['comments']: 
        if any(word in re.findall(r'\w+', sentence.lower()) for word in retirement_words_list):
            return '401k/Retirement'
        else:
            return 'Other'

df['topic'] = df.apply(word_checker,axis=1)

The code is labeling every single comment in my dataframe as 'Other' even though I have double-checked that many comments contain one or several of the words from my list. Any ideas for how I may correct my code? I'd greatly appreciate your help.

Solution

Probably more convenient to have a set version of retirements_word_list (for efficient inclusing testing) and then loop over words in the sentence, checking inclusion in this set, rather than the other way round:

retirement_words_list = ['match','matching','401k','retirement','retire','rsu','rrsp']

retirement_words_set = set(retirement_words_list)

and then

    if any(word in retirement_words_list for word in sentence.lower().split()):
            # .... etc ....

Your code is just checking whether any word in retirement_words_list is a substring of the sentence, but in fact you must be looking for whole-word matches or it wouldn't make sense to include 'matching' and 'retirement' on the list given that 'match' and 'retire' are already included. Hence the use of split -- and the reason why we can then also reverse the logic.

NOTE: You may need some further changes because your function word_checker has a parameter called row which it does not use. Possibly what you meant to do was something like:

def word_checker(sentence):
    if any(word in retirement_words_list for word in sentence.lower().split()):
        return '401k/Retirement'
    else:
        return 'Other'

and:

df['topic'] = df['comments'].apply(word_checker,axis=1)

where sentence is the contents of each row from the comments column.