Search code examples
pythontensorflowmultinomialeager-execution

Tensorflow multinomial distribution with eager execution


I am coming from this tutorial, which uses a multinomial distribution in eager execution to get a final prediction for the next character for text generation, based on a predictions tensor coming from our RNN.

# using a multinomial distribution to predict the character returned by the model
temperature = 0.5
predictions = predictions / temperature
predicted_id = tf.multinomial(predictions, num_samples=1)[-1,0].numpy()

My questions are:

  1. Isn't temperature (here 0.5) not just scaling all predictions, why does it influence the multinomial selection then?

    [0.2, 0.4, 0.3, 0.1]/temperature = [0.4, 0.8, 0.6, 0.2]

    So isn't the multinomial normalizing the probabilities? And thus when scaling we just increase the probability for each character with a limit at 1?

  2. What does [-1, 0].numpy() do? I am completely lost with this one.

Any hints are appreciated.


Solution

    1. [i, :] represents the unnormalized log-probabilities for all classes.

    Thus, the smaller the probability in the first place the smaller it becomes for temperatures smaller than 1. And the larger for temperatures lager than 1:

    math.exp(0.4)/math.exp(0.8) = 0.670
    math.exp(0.3)/ math.exp(0.6) = 0.7408
    math.exp(0.2)/ math.exp(0.4) = 0.818
    math.exp(0.1)/ math.exp(0.2) = 0.9048
    
    1. [-1, 0].numpy() just gets the value of the multinomial tensor

    such as:

    tf.multinomial(predictions, num_samples=1)
    tf.Tensor([[3]], shape=(1, 1), dtype=int64)
    to 3