Search code examples
pythondataframeword-count

Python | Count words in dataframe that are matching a prespecified list of words


I'm trying to count the words in a Dataframe column consisting of speeches. I have created a lists with words associated with different themes, for example:

Care = [safe, peace, compassion, empath, care, caring, protect, shield, shelter]

Now i would like to count how many times, in total, words in the "Care" list occur in each speech, and then add a new column at the end of the df with the count of each row.

I'm using this code right now.

df = df.assign(Care=df['speech'].str.count('|'.join(care)))

But im suspecting that it gives me partial matches aswell. I would like to only get a match when the words match the whole word in my list. Any ideas?


Solution

  • Assuming that the speech is free of punctuation marks, this might work -

    df['count'] = df['speech'].apply(lambda x: len([val for val in x.split() if val in Care]))