understanding maximum likelihood in nlp

I am trying to understand what maximum likelihood in NLP is. I was looking at this presntation:

http://www.phontron.com/slides/nlp-programming-en-01-unigramlm.pdf (page 9)

and I saw the same equation in Foundations of Statistical Language Processing by Manning and Schütze.

Now, the way I understand MLE it is about this:

I know the outcome of an experiment, I know the underlying distribution, but I dont know the probability for a single event. MLE helps me find the probability (or more general an unknown parameter) by finding the value for the probability that is most likely given my observations.

So MLE tells me, that the probability of observing some event is the highest, when the probability for any single event is x.

Now if that is true, why is there no sight of calculus on that slide? Why is the MLE in this case calculated by a simple fraction? I don't see what this has to do with MLE?

I thought, MLE was a maximization problem...?

Solution

MLE is indeed a maximization problem. In the slides, they skipped over the calculations and just indicated the result of the MLE. If you want to see the full derivation, you can look at page 3 here for example http://statweb.stanford.edu/~susan/courses/s200/lectures/lect11.pdf

This link explains how to find Maximum Likelihood Estimator of parameters of multinomial distribution, and the same type of calculation also leads to the resuls you saw in the slides.

The n in the link corresponds to c(w1,…,wi−1) from your slides (as this is the total number of cases), and x_i in link corresponds to c(w1,…,wi) from your slides (as this is the total number of the specific cases you want to count, among all the cases).