Search code examples
csvpython-2.7countfrequency-analysisword-frequency

Word frequency count based on two words using python


There are many resources online that shows how to do a word count for single word like this and this and this and others...
But I was not not able to find a concrete example for two words count frequency .

I have a csv file that has some strings in it.

FileList = "I love TV show makes me happy, I love also comedy show makes me feel like flying"

So I want the output to be like :

wordscount =  {"I love": 2, "show makes": 2, "makes me" : 2 }

Of course I will have to strip all the comma, interrogation points.... {!, , ", ', ?, ., (,), [, ], ^, %, #, @, &, *, -, _, ;, /, \, |, }

I will also remove some stop words which I found here just to get more concrete data from the text.

How can I achieve this results using python?

Thanks!


Solution

  • >>> from collections import Counter
    >>> import re
    >>> 
    >>> sentence = "I love TV show makes me happy, I love also comedy show makes me feel like flying"
    >>> words = re.findall(r'\w+', sentence)
    >>> two_words = [' '.join(ws) for ws in zip(words, words[1:])]
    >>> wordscount = {w:f for w, f in Counter(two_words).most_common() if f > 1}
    >>> wordscount
    {'show makes': 2, 'makes me': 2, 'I love': 2}