Search code examples
tensorflowmachine-learningdeep-learninggradient-descent

Why training error jumps without changing learning rate?


I'm training a convolutional network with about 10 convolutional layers and a few pooling layers. Training set is about 250,000 samples (16,000 length vectors). About 50% through the first epoch, the training and test error jumped from about 68% to 92%. Learning rate was the same (batch gradient descent). Batch size was 32. What caused that jump, and how can the jump be interpreted?

enter image description here


Solution

  • Found this slide on Stanford's deep learning course https://youtu.be/wEoyxE0GP2M?t=1h18m2s

    The explanation given is that this is a symptom of bad parameter initialization. For a while there is little to no learning, then suddenly the parameter adjust enough in the right direction, and you get a significant spike in accuracy and/or loss.