Search code examples
pythonpandasdataframefrequencycpu-word

How to count frequncy of words from a list in a dataframe column?


If I have a dataframe with the following layout:

ID#      Response
1234     Covid-19 was a disaster for my business
3456     The way you handled this pandemic was awesome

I want to be able to count frequency of specific words from a list.

list=['covid','COVID','Covid-19','pandemic','coronavirus']

In the end I want to generate a dictionary like the following

{covid:0,COVID:0,Covid-19:1,pandemic:1,'coronavirus':0}

Please help I am really stuck on how to code this in python


Solution

  • For each string, find number of matches.

    dict((s, df['response'].str.count(s).fillna(0).sum()) for s in list_of_strings)
    

    Note that Series.str.count takes a regex input. You may want to append (?=\b) for positive look-ahead word-endings.

    Series.str.count returns NA when counting NA, thus, fill with 0. For each string, sum over column.