I have created a list of words associated with a certain category. For example:
care = ["safe", "peace", "empathy"]
And I have a dataframe containing speeches, that on average consist of 450 words. I have counted the number of matches for each category using this line of code:
df['Care'] = df['Speech'].apply(lambda x: len([val for val in x.split() if val in care]))
Which gives me the total amount of matches for each category.
However i need to review the frequencies of each word in the list. I tried using this code to solve my problem.
df.Tal.str.extractall('({})'.format('|'.join(auktoritet)))\
.iloc[:, 0].str.get_dummies().sum(level=0)
I've tried different methods but the problems is that i always get partial matches included. For example hammer would be counted for ham.
Any ideas on how to solve this?
I build on Akash answer, and managed to get the frequencies of prespecified words stored in a list and then counting them in the dataframe, by simply adding a line.
from collections import Counter
word_count=Counter()
for line in df['Speech']:
for word in line.split(' '):
if word in care:
word_count[word]+=1
word_count.most_common()