Search code examples
information-retrievalsmoothing

How to smooth unigrams


I have a unigram language model and i want to smooth the counts. Is add one smoothing the only way or can i use some other smoothing also. I dont think we can use knesser nay as that is for Ngrams with N>=2. Any other smoothing method you know?

How about witten bell?


Solution

  • For unigram smoothing, Good-Turing would be optimal ...and it's easy (to apply)!

    http://en.wikipedia.org/wiki/Good%E2%80%93Turing_frequency_estimation

    For higher orders, modified interpolated Kneser-Ney is a good choice.