Search code examples
pythonnlpnltkn-gramcollocation

nltk quadgram collocation finder


I am seeing mulitple questions and answers saying that NLTK collocation cannot be done beyond bi and tri grams.

example this one - How to get n-gram collocations and association in python nltk?

I am seeing that there is a something called

nltk.QuadgramCollocationFinder

Similar to

nltk.BigramCollocationFinder and nltk.TrigramCollocationFinder

But at the same time cannot see anything like

nltk.collocations.QuadgramAssocMeasures()

similar to nltk.collocations.BigramAssocMeasures() and nltk.collocations.TrigramAssocMeasures()

What is the purpose of nltk.QuadgramCollocationFinder if its not possible (without hacks) to find n-grams beyond bi and tri grams.

Maybe I am missing something.

Thanks,

Adding in the code and updating the question as per input from Alvas, this now works

import nltk
from nltk.collocations import *
from nltk.corpus import PlaintextCorpusReader
from nltk.metrics.association import QuadgramAssocMeasures

bigram_measures = nltk.collocations.BigramAssocMeasures()
trigram_measures = nltk.collocations.TrigramAssocMeasures()
quadgram_measures = QuadgramAssocMeasures()

the_filter = lambda *w: 'crazy' not in w

finder = BigramCollocationFinder.from_words(corpus)
finder.apply_freq_filter(3)
finder.apply_ngram_filter(the_filter)
print (finder.nbest(bigram_measures.likelihood_ratio, 10))


finder = QuadgramCollocationFinder.from_words(corpus)
finder.apply_freq_filter(3)
finder.apply_ngram_filter(the_filter)
print(finder.nbest(quadgram_measures.likelihood_ratio,10))

Solution

  • From the repo:

    from nltk.metrics.association import QuadgramAssocMeasures