Search code examples
amazon-web-servicescommand-line-interfacetmux

How to preserve pytorch training on AWS after ssh disconnection?


I am currently connecting to Amazon AWS EC2 through ssh, and I use it to train a pytorch network.

The problem is that whenever the local ssh terminal window is closed, training is interrupted on AWS. And if I connect again, there are no processes running on AWS.

How can I keep the network training on AWS all the time, such that if I close the ssh terminal window, I can still connect again and find the network training?

Update:

Running python main.py & does not work. If I close the ssh window and connect again and run top I find no processes..


Solution

  • You could use a terminal multiplexer like tmux or screen.
    This way you can detach from your screen without killing the running process.

    For instance with tmux. A few things to get started, running tmux will create a new session. Ctrl B-D to detach the view; tmux a to access the last session, Ctrl B-S to list the live sessions.