How does generative language model work in the natural language processing?

AFAIK, the generative language model is nothing but a probability distribution for some vocabulary. I am wondering how to use this probability distribution to generate a stream of words, i.e. language?

If I always pick the word with biggest probability, it will always be the same word because the distribution is fixed.

I am not sure if I understand it correctly. Could anyone provide a concrete operational example?

Solution

First of all, you don't pick the word with highest probability. You pick a random word, but no uniformly - with the probability in the model.

So, if you have 2 words in a model: "yes" and "no", and the probability distribution is 2/3 "yes", 1/3 "no", than the generated text may look like this:

yes no no yes yes no yes yes yes no yes yes yes

I.e., you'll have approximately 2/3 "yes" in the text and 1/3 "no".

EDIT

Here's a simple way to sample from the distribution:

Generate a random number from 0 to 1.
Iterate over all words in the model, summing their probability weights. As soon as the sum is larger than the generated number, emit the current word.

Here's an example:

Suppose you've generated 0.8. You start from yes and the accumulated probability weight will be 0.67, so you take next word no and get the accumulated weight 1.0 which is greater than 0.8, so you emit no.

Suppose next time you have 0.5, then you need to emit yes.