Search code examples
pytorchgoogle-colaboratorydata-persistence

Google Colab: "Unable to connect to the runtime" after uploading Pytorch model from local


I am using a simple (not necessarily efficient) method for Pytorch model saving.

import torch
from google.colab import files

torch.save(model, filename) # save a trained model on the VM
files.download(filename) # download the model to local

best_model = files.upload() # select the model just downloaded
best_model[filename] # access the model

Colab disconnects during execution of the last line, and hitting RECONNECT tab always shows ALLOCATING -> CONNECTING (fails, with "unable to connect to the runtime" message in the left bottom corner) -> RECONNECT. At the same time, executing any one of the cells gives Error message "Failed to execute cell, Could not send execute message to runtime: [object CloseEvent]"

I know it is related to the last line, because I can successfully connect with my other google accounts which doesn't execute that.

Why does it happen? It seems the google accounts which have executed the last line can no longer connect to the runtime.

Edit:

One night later, I can reconnect with the google account after session expiration. I just attempted the approach in the comment, and found that just files.upload() the Pytorch model would lead to the problem. Once the upload completes, Colab disconnects.


Solution

  • (I wrote this answer before reading your update. Think it may help.)

    files.upload() is just for uploading files. We have no reason to expect it to return some pytorch type/model.

    When you call a = files.upload(), a is a dictionary of filename - a big bytes array.

    {'my_image.png': b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR....' }
    type(a['my_image.png'])
    

    Just like when you do open('my_image', 'b').read()

    So, I think the next line best_model[filename] try to print the whole huge bytes array, which bugs the colab.