I'm training a convolutional network with about 10 convolutional layers and a few pooling layers. Training set is about 250,000 samples (16,000 length vectors). About 50% through the first epoch, the training and test error jumped from about 68% to 92%. Learning rate was the same (batch gradient descent). Batch size was 32. What caused that jump, and how can the jump be interpreted?
Found this slide on Stanford's deep learning course https://youtu.be/wEoyxE0GP2M?t=1h18m2s
The explanation given is that this is a symptom of bad parameter initialization. For a while there is little to no learning, then suddenly the parameter adjust enough in the right direction, and you get a significant spike in accuracy and/or loss.