I am trying to gain some understanding of Keras, by trying to run some of the provided examples (such as https://github.com/keras-team/keras/blob/tf-keras-2/examples/mnist_cnn.py in this particular case) and seeing how I can work with them to see what happens. However, the very baseline output of the example as stated at the top of that file, namely 99.25% accuracy, is way higher than the output I'm getting in Google Colab (using a T4 GPU), namely 85% (0.8503000140190125).
My output, when simply copying and pasting the linked file into Google Colab, gives me the following output:
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Epoch 1/12
469/469 [==============================] - 8s 10ms/step - loss: 2.2807 - accuracy: 0.1435 - val_loss: 2.2415 - val_accuracy: 0.3414
Epoch 2/12
469/469 [==============================] - 4s 10ms/step - loss: 2.2157 - accuracy: 0.2814 - val_loss: 2.1615 - val_accuracy: 0.5900
Epoch 3/12
469/469 [==============================] - 4s 9ms/step - loss: 2.1305 - accuracy: 0.4081 - val_loss: 2.0526 - val_accuracy: 0.6552
Epoch 4/12
469/469 [==============================] - 4s 9ms/step - loss: 2.0150 - accuracy: 0.4893 - val_loss: 1.9049 - val_accuracy: 0.6928
Epoch 5/12
469/469 [==============================] - 5s 10ms/step - loss: 1.8653 - accuracy: 0.5421 - val_loss: 1.7169 - val_accuracy: 0.7290
Epoch 6/12
469/469 [==============================] - 4s 9ms/step - loss: 1.6864 - accuracy: 0.5822 - val_loss: 1.4985 - val_accuracy: 0.7573
Epoch 7/12
469/469 [==============================] - 5s 10ms/step - loss: 1.4975 - accuracy: 0.6175 - val_loss: 1.2778 - val_accuracy: 0.7841
Epoch 8/12
469/469 [==============================] - 4s 9ms/step - loss: 1.3218 - accuracy: 0.6478 - val_loss: 1.0859 - val_accuracy: 0.8070
Epoch 9/12
469/469 [==============================] - 5s 10ms/step - loss: 1.1783 - accuracy: 0.6739 - val_lloss: 0.9350 - val_accuracy: 0.8256
Epoch 10/12
469/469 [==============================] - 4s 10ms/step - loss: 1.0702 - accuracy: 0.6944 - val_loss: 0.8224 - val_accuracy: 0.8354
Epoch 11/12
469/469 [==============================] - 4s 9ms/step - loss: 0.9836 - accuracy: 0.7120 - val_loss: 0.7383 - val_accuracy: 0.8433
Epoch 12/12
469/469 [==============================] - 4s 9ms/step - loss: 0.9166 - accuracy: 0.7276 - val_loss: 0.6741 - val_accuracy: 0.8503
Test loss: 0.6741476655006409
Test accuracy: 0.8503000140190125oss: 0.9350 - val_accuracy: 0.8256
As you can see, the Google Colab takes much less time on any epoch when compared to the comment at the top of the example file. I'd love to know if there's something I'm missing here. For example, why do they say "there is still a lot of margin for parameter tuning"? Is this supposed to be some kind of 'tutorial' where I'm supposed to tweak those parameters until I get their 'holy grail' of 99.25%?
As per this SO answer, the issue stems from the different default learning rates between the old Keras implementation (learning_rate=1
) used in the example, and the current tensorflow/Keras implementation of Adadelta (learning_rate=0.001
).
If you explicitly set the learning rate to 1:
optimizer=tf.keras.optimizers.Adadelta(learning_rate=1)
you will get 0.99 accuracy in 12 epochs.
This is also hinted at in the current API reference for Adadelta: "Note that Adadelta tends to benefit from higher initial learning rate values compared to other optimizers. To match the exact form in the original paper, use 1.0."