I am seeing mulitple questions and answers saying that NLTK collocation cannot be done beyond bi and tri grams.
example this one - How to get n-gram collocations and association in python nltk?
I am seeing that there is a something called
nltk.QuadgramCollocationFinder
Similar to
nltk.BigramCollocationFinder and nltk.TrigramCollocationFinder
But at the same time cannot see anything like
nltk.collocations.QuadgramAssocMeasures()
similar to nltk.collocations.BigramAssocMeasures() and nltk.collocations.TrigramAssocMeasures()
What is the purpose of nltk.QuadgramCollocationFinder if its not possible (without hacks) to find n-grams beyond bi and tri grams.
Maybe I am missing something.
Thanks,
Adding in the code and updating the question as per input from Alvas, this now works
import nltk
from nltk.collocations import *
from nltk.corpus import PlaintextCorpusReader
from nltk.metrics.association import QuadgramAssocMeasures
bigram_measures = nltk.collocations.BigramAssocMeasures()
trigram_measures = nltk.collocations.TrigramAssocMeasures()
quadgram_measures = QuadgramAssocMeasures()
the_filter = lambda *w: 'crazy' not in w
finder = BigramCollocationFinder.from_words(corpus)
finder.apply_freq_filter(3)
finder.apply_ngram_filter(the_filter)
print (finder.nbest(bigram_measures.likelihood_ratio, 10))
finder = QuadgramCollocationFinder.from_words(corpus)
finder.apply_freq_filter(3)
finder.apply_ngram_filter(the_filter)
print(finder.nbest(quadgram_measures.likelihood_ratio,10))
From the repo:
from nltk.metrics.association import QuadgramAssocMeasures