Search code examples
pythonnltkfrequency-distribution

Frequency Distribution of Bigrams


I have done the following

import nltk


words = nltk.corpus.brown.words()
freq = nltk.FreqDist(words)

And am able to find the frequency of certain words in the brown corpus, like

freq["the"]
62713

But now I want to be able to find the Frequency Distribution of specific bigrams. So then I tried

bigrams = nltk.bigrams(words)
freqbig = nltk.FreqDist(bigrams)

But every bigram that I enter, I always get 0. Like,

freqbig["the man"]
0

What I am doing wrong?


Solution

  • It accepts a tuple as key, not a str:

    freqbig[("the", "man")]
    

    OUTPUT

    128
    

    If you want to pass strings, you could create an auxiliary function which takes care of it:

    def get_frequency(my_string):
        return freqbig[tuple(my_string.split(" "))]