Search code examples
nlpnltksmoothinglanguage-model

Define and Use new smoothing method in nltk language models


I'm trying to provide and test new smoothing method for language models. I'm using nltk tools and don't want to redefine everything from scratch. So is there any way to define and use my own smoothing method in nltk models?

Edit: I'm trying to do something like this :

def my_smoothing_method(model) :
    # some code using model (MLE) count

model = nltk.lm.MLE(n, smoothing_method=my_smoothing_method)
model.fit(train)

Solution

  • Here, you can see the definition of MLE. As you can see, there is no option of a smoothing function (but there are others in the same file, probably some of them fits your needs?).

    The InterpolatedLanguageModel (see same file above) does accept a smoothing classifier which needs to implement alpha_gamma(word, context) and unigram_score(word) and be a subclass of Smoothing:

    model = nltk.lm.InterpolatedLanguageModel(smoothing_cls=my_smoothing_method, order)
    

    So if you really need to add functionality to the MLE class, you could do something like that, but I am not sure if this is a good idea :

    class MLE_with_smoothing(LanguageModel):
    """Class for providing MLE ngram model scores.
    Inherits initialization from BaseNgramModel.
    """
    
    def unmasked_score(self, word, context=None):
        """Returns the MLE score for a word given a context.
        Args:
        - word is expcected to be a string
        - context is expected to be something reasonably convertible to a tuple
        """
        freq = self.context_counts(context).freq(word)
        #Do some smothing 
        return