Search code examples
tensorflowrecurrent-neural-networklstm

What does "truncated gradients" mean in LSTM?


I'm learning a tensorflow tutorial about LSTM: Truncated Backpropagation.

This section says the code uses "truncated backpropagation", so what exactly does this mean?


Solution

  • In a neural network setting in general (well, most of the time) you perform two steps during training:

    FORWARD PASS

    • Show some inputs to the net, and check the output
    • Compute a loss on the output (versus labels, or versus some behavior you want)

    BACKWARD PASS

    • With the computed loss and the state of your net you calculate gradients to be applied to the weights of your net in order for it to learn.
    • These gradients are applied from your output layer backwards.

    In the backward pass it might be that, for some reason, you only want to train the top layer or only some specific parts of your net. In this case you would want to stop the backwards passing of gradients at that point. This is what truncating backpropagation does (often done via https://www.tensorflow.org/versions/r0.9/api_docs/python/train.html#stop_gradient).