AFAIK, the generative language model is nothing but a probability distribution for some vocabulary. I am wondering how to use this probability distribution to generate a stream of words, i.e. language?
If I always pick the word with biggest probability, it will always be the same word because the distribution is fixed.
I am not sure if I understand it correctly. Could anyone provide a concrete operational example?
First of all, you don't pick the word with highest probability. You pick a random word, but no uniformly - with the probability in the model.
So, if you have 2 words in a model: "yes" and "no", and the probability distribution is 2/3 "yes", 1/3 "no", than the generated text may look like this:
yes no no yes yes no yes yes yes no yes yes yes
I.e., you'll have approximately 2/3 "yes" in the text and 1/3 "no".
EDIT
Here's a simple way to sample from the distribution:
Here's an example:
Suppose you've generated 0.8
. You start from yes
and the accumulated probability weight will be 0.67
, so you take next word no
and get the accumulated weight 1.0
which is greater than 0.8
, so you emit no
.
Suppose next time you have 0.5
, then you need to emit yes
.