Search code examples
google-cloud-platformgoogle-compute-enginegoogle-cloud-sdk

Can't keep SSH connection to VM using gcloud-sdk


I have a google cloud Deep Learning Virtual Machine Image for PyTorch that uses an SSH connection to connect to the Jupyter Notebook on it. How can I change what I am currently doing so that the Jupyter Notebook remains alive even when I close my laptop/temporarily disconnect from internet?

Currently after turning my VM and opening a tmux window I start up the Jupyter Notebook and its SSH connection with this command:

gcloud compute ssh <my-server-name> -- -L 8080:localhost:8080

This code is taken from the official docs for the deep learning images here: https://cloud.google.com/deep-learning-vm/docs/jupyter

I can then connect at localhost:8080 and do what I need to. However, if I start training a model for a long time and need to close my laptop, when I re-open it my ssh connection breaks, the Jupyter Notebook is turned off, and my model that is training is interrupted.

How can I keep this Juptyer Notebook live and be able to reconnect to it later?

NB. I used to use the Google Cloud browser SSH option and once in the server start a tmux window and the jupyter notebook within it. This worked great and meant the notebook was always alive. However, with the Google Cloud images that have CUDA and Jupyter preinstalled, this doesn't work and the only way I have been able to connect is through the above command.


Solution

  • I have faced this problem before on GCP too and found a simple way to resolve this. Once you have ssh'd into the compute engine, run the linux screen command and you will find yourself in a virtual terminal (you may open many terminals in parallel) and it is here you will want to run your long running job.

    Once you have started the job, detach from the screen using the keys the keys Ctrl+a and then d. Once detached, you can exit out of the VM, reconnect to the VM and run screen -r and you will find that your job is still running.

    Of course, you can do a lot of cool stuff with the screen command and would encourage you to read some of the tutorials found here.

    NOTE: Please ensure that your Compute Engine instance is not a Pre-emptible machine!

    Let me know if this helps!