https://www.tensorflow.org/tutorials/recurrent#truncated_backpropagation
Here, Official TF document says,
"In order to make the learning process tractable, it is common practice to create an 'unrolled' version of the network, which contains a fixed number (num_steps) of LSTM inputs and outputs."
and the document entails;
words = tf.placeholder(tf.int32, [batch_size, num_steps])
lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
initial_state = state = tf.zeros([batch_size, lstm.state_size])
for i in range(num_steps):
output, state = lstm(words[:, i], state)
# The rest of the code.
# ...
final_state = state
# After some code lines...
numpy_state = initial_state.eval()
total_loss = 0.0
for current_batch_of_words in words_in_dataset:
numpy_state, current_loss = session.run([final_state, loss],
# Initialize the LSTM state from the previous iteration.
feed_dict={initial_state: numpy_state, words: current_batch_of_words})
total_loss += current_loss
These lines implement truncated backpropagation(BPTT) part, but I'm not sure above code part is essentially needed. Does Tensorflow (I'm using 1.3) conduct proper backpropagation automatically, even if hand-written back prop implementation part is absent? Does putting the BPTT implementation code increases prediction accuracy noticeably?
The code above uses returned state from RNN Layer from previous timestep to feed RNNCell of next timestep. According to the official document, RNN(GRUCell, LSTMCell...) layer returns tuple of output and state, but I built my model only with output, and didn't touch state. I just passed output to Fully connected layer, and reshaped, then calculated loss with tf.losses.softmax_cross_entropy.
Does Tensorflow (I'm using 1.3) conduct proper backpropagation automatically, even if hand-written back prop implementation part is absent?
According to Backpropagation (through time) code in Tensorflow, yes! Tensorflow does automatic differentiation automatically, which effectively implements BPTT.
Does putting the BPTT implementation code increases prediction accuracy noticeably?
Your link is now broken, but maybe they did that just to show what was an equivalent computation? I don't see any reason to believe it would improve accuracy.