Search code examples
tensorflowlanguage-modelsoftmax

What is the softmax_w and softmax_b in this document?


I'm new to TensorFlow and need to train a language model but run into some difficulties while reading the document as shown bellow.

lstm = rnn_cell.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
state = tf.zeros([batch_size, lstm.state_size])

loss = 0.0
for current_batch_of_words in words_in_dataset:
    # The value of state is updated after processing each batch of words.
    output, state = lstm(current_batch_of_words, state)

    # The LSTM output can be used to make next word predictions
    logits = tf.matmul(output, softmax_w) + softmax_b
    probabilities = tf.nn.softmax(logits)
    loss += loss_function(probabilities, target_words)

I don't understand why this line is needed,

logits = tf.matmul(output, softmax_w) + softmax_b

Since I learned that once the output is computed out and the target_words are known we can directly work out the loss. It seems that the pseudo-code adds an additional layer. In addition, what is the softmax_w and softmax_b which are not aforementioned. I thought I may have missed something important by raising such a simple question.

Please point me in the right direction, and any suggestions are highly appreciated. Thanks a lot.


Solution

  • All that code is doing is adding an extra linear transformation before computing the softmax. softmax_w should be a tf.Variable containing a matrix of weights. softmax_b should be a tf.Variable containing a bias vector.

    Take a look at the softmax example in this tutorial for more details: https://www.tensorflow.org/versions/r0.10/tutorials/mnist/beginners/index.html#softmax-regressions