I'm trying to use spot VMs and currently setting up my workflow. The last piece I haven't figured out is launching a tmux session with my experiments, so that I can sometimes ssh to the spot VM to check their status without having the process die after I disconnect.
This is my startup script (user
is a dummy username):
#!/bin/bash
sudo -u user bash <<EOF
cd /home/user/
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O /tmp/miniconda3.sh 2> log_miniconda
bash /tmp/miniconda3.sh -b -p /home/user/miniconda3 2> log_minicondash
/home/user/miniconda3/bin/conda init bash
source /home/user/miniconda3/etc/profile.d/conda.sh
source /home/user/.bashrc
conda create -y -n test python=3.8
gsutil cp -r gs://my-bucket/spot_vm /home/user/
tmux start-server && echo "--- Line 1 OK" >>/tmp/debug.txt
tmux new -d -s main 'sleep 1; cd /home/user/spot_vm; eval "$(conda shell.bash hook)"; conda activate test; python spot_test.py' && echo "--- Line 2 OK" >>/tmp/debug.txt
EOF
The spot_test.py
is supposed to simply write the time on a file every minutes, just to check that the VM is running. However the file is not created. The output of the startup script is:
Sep 01 16:14:13 test google_metadata_script_runner[834]: startup-script: Operation completed over 48 objects/42.1 KiB.
Sep 01 16:14:13 test sudo[1631]: pam_unix(sudo:session): session closed for user user
Sep 01 16:14:13 test google_metadata_script_runner[834]: startup-script exit status 0
Sep 01 16:14:13 test google_metadata_script_runner[834]: Finished running startup scripts.
Sep 01 16:14:13 test systemd[1]: google-startup-scripts.service: Succeeded.
Sep 01 16:14:13 test systemd[1]: google-startup-scripts.service: Unit process 1955 (tmux: server) remains running after unit stopped.
Sep 01 16:14:13 test systemd[1]: google-startup-scripts.service: Unit process 1956 (bash) remains running after unit stopped.
Sep 01 16:14:13 test systemd[1]: google-startup-scripts.service: Unit process 1958 (sleep) remains running after unit stopped.
Sep 01 16:14:13 test systemd[1]: Finished Google Compute Engine Startup Scripts.
Sep 01 16:14:13 test systemd[1]: google-startup-scripts.service: Consumed 31.619s CPU time.
It's saying that a tmux unit is running, but I can't find it with tmux ls
. I have find several questions on stackoverflow about this problem, but none of them seemed to fix this here. Can somebody help please?
The problem ended up being that tmux when run in a startup script like this couldn't activate a conda environment, saying that conda wasn't properly initialised. The fix was to do conda activate test
before using tmux, which somehow solves it.