Search code examples
kerasjupyter-notebookgpugoogle-colaboratory

Keras on Google Colaboratory: Incomplete iterations?


I'm running keras addition rnn (seq2seq) example - here. I've tried running this on 1. Jupyter on ubuntu VM, and also on Google colaboratory notebook with GPU. But I'm afraid on Google Colab, it's not completing all the iterations. To be more specific,

Below are logs from regular jupyter notebook:

Iteration 1
Train on 45000 samples, validate on 5000 samples
Epoch 1/1
45000/45000 [==============================] - 75s 2ms/step - loss: 1.8899 - acc: 0.3209 - val_loss: 1.7819 - val_acc: 0.3429

Below are logs from Google colaboratory notebook.

Iteration 1
Train on 45000 samples, validate on 5000 samples
Epoch 1/1
17536/45000 [==========>...................] - ETA: 10s - loss: 2.0067 - acc: 0.2934

Note that after this incomplete iteration, it will not stop, instead it will move to next iteration. Below are Logs from next iteration on colab notebook -

Iteration 2
Train on 45000 samples, validate on 5000 samples
Epoch 1/1
34688/45000 [======================>.......] - ETA: 2s - loss: 1.7466 - acc: 0.3562

Note that I'm using the same code on both of these environments. I do not understand what is happening here. Why iterations are not completed on Google colab notebook? Is it something related to GPUs on Google colab? How do I fix this? Any pointers will be appreciated. Thank you!


Solution

  • I faced this problem in Co-Lab provides limited memory upto(12 GB) in cloud which creates many issues while solving a problem. That's why only 300 images are used to train and test.when images was preprocessed with dimension 600x600 and batch size was set to 128 it Keras model freezed during epoch 1 .Compiler did not show this error.Actually the error was runtime limited memory which was unable to handle by CoLab because it gave only 12GB limited memory for usage. Solution to above mentioned problem was solved by changing batch size to 4 and reduce image dimension to 300x300 because with 600x600 it still not work. Conclusively,Recommend Solution is Make Images dimension and Batch_size small until you get no error Run Again and Again by further changing batch size and image dimension small until there will no run time errorstrong text