Search code examples
google-cloud-platformstartuptmux

GCP running tmux on startup-script


I'm trying to use spot VMs and currently setting up my workflow. The last piece I haven't figured out is launching a tmux session with my experiments, so that I can sometimes ssh to the spot VM to check their status without having the process die after I disconnect.

This is my startup script (user is a dummy username):

#!/bin/bash

sudo -u user bash <<EOF
cd /home/user/

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O /tmp/miniconda3.sh 2> log_miniconda
bash /tmp/miniconda3.sh -b -p /home/user/miniconda3 2> log_minicondash

/home/user/miniconda3/bin/conda init bash
source /home/user/miniconda3/etc/profile.d/conda.sh
source /home/user/.bashrc

conda create -y -n test python=3.8

gsutil cp -r gs://my-bucket/spot_vm /home/user/

tmux start-server && echo "--- Line 1 OK" >>/tmp/debug.txt
tmux new -d -s main 'sleep 1; cd /home/user/spot_vm; eval "$(conda shell.bash hook)"; conda activate test; python spot_test.py' && echo "--- Line 2 OK" >>/tmp/debug.txt
EOF

The spot_test.py is supposed to simply write the time on a file every minutes, just to check that the VM is running. However the file is not created. The output of the startup script is:

Sep 01 16:14:13 test google_metadata_script_runner[834]: startup-script: Operation completed over 48 objects/42.1 KiB.
Sep 01 16:14:13 test sudo[1631]: pam_unix(sudo:session): session closed for user user
Sep 01 16:14:13 test google_metadata_script_runner[834]: startup-script exit status 0
Sep 01 16:14:13 test google_metadata_script_runner[834]: Finished running startup scripts.
Sep 01 16:14:13 test systemd[1]: google-startup-scripts.service: Succeeded.
Sep 01 16:14:13 test systemd[1]: google-startup-scripts.service: Unit process 1955 (tmux: server) remains running after unit stopped.
Sep 01 16:14:13 test systemd[1]: google-startup-scripts.service: Unit process 1956 (bash) remains running after unit stopped.
Sep 01 16:14:13 test systemd[1]: google-startup-scripts.service: Unit process 1958 (sleep) remains running after unit stopped.
Sep 01 16:14:13 test systemd[1]: Finished Google Compute Engine Startup Scripts.
Sep 01 16:14:13 test systemd[1]: google-startup-scripts.service: Consumed 31.619s CPU time.

It's saying that a tmux unit is running, but I can't find it with tmux ls. I have find several questions on stackoverflow about this problem, but none of them seemed to fix this here. Can somebody help please?


Solution

  • The problem ended up being that tmux when run in a startup script like this couldn't activate a conda environment, saying that conda wasn't properly initialised. The fix was to do conda activate test before using tmux, which somehow solves it.