Search code examples
deep-learningneural-networkrecurrent-neural-network

Does Gradient Clipping reduce effectiveness of a RNN


In order to prevent the gradients from exploding we use gradient clipping. In element-wise clipping we use a number like [-10,10].
If gradient exceeds 10 it is changed to 10.

As we are changing gradient to a random number , why it does not effect RNN's efficiency.


Solution

  • Gradient clipping ensures the gradient vector g has norm at most x. This helps gradient descent to have a reasonable behavior even if the loss landscape of the model is irregular. The following figure shows an example with an extremely steep cliff in the loss landscape. Without clipping, the parameters take a huge descent step and leave the “good” region. With clipping, the descent step size is restricted and the parameters stay in the “good” region.

    enter image description here

    In general it's doing good more than it may cause harm to the model, that's why it doesn't effect the efficiency, also a more practical choice is to normalize the gradient instead of clipping it, this may give a better results.