Search code examples
pythonstring-matchingword-frequency

Count word frequencies of each word in a list in dataframe


I have created a list of words associated with a certain category. For example:

care = ["safe", "peace", "empathy"]

And I have a dataframe containing speeches, that on average consist of 450 words. I have counted the number of matches for each category using this line of code:

df['Care'] = df['Speech'].apply(lambda x: len([val for val in x.split() if val in care]))

Which gives me the total amount of matches for each category.

However i need to review the frequencies of each word in the list. I tried using this code to solve my problem.

df.Tal.str.extractall('({})'.format('|'.join(auktoritet)))\
                           .iloc[:, 0].str.get_dummies().sum(level=0)

I've tried different methods but the problems is that i always get partial matches included. For example hammer would be counted for ham.

Any ideas on how to solve this?


Solution

  • I build on Akash answer, and managed to get the frequencies of prespecified words stored in a list and then counting them in the dataframe, by simply adding a line.

    from collections import Counter
    
    word_count=Counter()
    for line in df['Speech']:
       for word in line.split(' '):
           if word in care:
               word_count[word]+=1
    
    word_count.most_common()