I'm new to TensorFlow and need to train a language model but run into some difficulties while reading the document as shown bellow.
lstm = rnn_cell.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
state = tf.zeros([batch_size, lstm.state_size])
loss = 0.0
for current_batch_of_words in words_in_dataset:
# The value of state is updated after processing each batch of words.
output, state = lstm(current_batch_of_words, state)
# The LSTM output can be used to make next word predictions
logits = tf.matmul(output, softmax_w) + softmax_b
probabilities = tf.nn.softmax(logits)
loss += loss_function(probabilities, target_words)
I don't understand why this line is needed,
logits = tf.matmul(output, softmax_w) + softmax_b
Since I learned that once the output is computed out and the target_words are known we can directly work out the loss. It seems that the pseudo-code adds an additional layer. In addition, what is the softmax_w and softmax_b which are not aforementioned. I thought I may have missed something important by raising such a simple question.
Please point me in the right direction, and any suggestions are highly appreciated. Thanks a lot.
All that code is doing is adding an extra linear transformation before computing the softmax. softmax_w
should be a tf.Variable
containing a matrix of weights. softmax_b
should be a tf.Variable
containing a bias vector.
Take a look at the softmax example in this tutorial for more details: https://www.tensorflow.org/versions/r0.10/tutorials/mnist/beginners/index.html#softmax-regressions