There are two python scripts involved in this task.
My current task requires me to run a long process(takes about a day or two per each, and this is the first python script) in each of 29 available regions on GCP's instances. In order to finish the task as quick as possible, I'm trying to run each process in each instance all at once after spinning off 29 VMs all at once.
As manually running the first script by SSH-ing in to each of the instance is cumbersome, I wrote a python script(the second script) that SSHs into each region's VM and runs the first script I mentioned above.
The issue with the second script that runs first script in different regions is that it doesn't start off to run the first script in second region's VM until it finishes running in the first region's VM, whereas I need the second script to run the first script in every region without waiting for the process started by first script to end.
I use subprocess() in the second script to run the first script in each VMs.
The following code is the second script:
for zone, instance in zipped_zone_instance:
command = "gcloud compute ssh --zone " + zone + " " + instance + " --project cloud-000000 --command"
command_lst = command.split(" ")
command_lst.append("python3 /home/first_script.py")
subprocess.run(command_lst)
I need the subprocess.run(command_lst) to run for every 29 zones at once rather than it running for the second zone only after the first zone's process ends.
The following code is the first script:
for idx, bucket in enumerate(bucket_lst):
start = time.time()
sync_src = '/home/' + 'benchmark-' + var_
subprocess.run(['gsutil', '-m', '-o', 'GSUtil:parallel_composite_upload_threshold=40M', 'rsync', '-r', sync_src, bucket])
end = time.time() - start
time_lst.append(end)
tput_lst.append(tf_record_disk_usage / end)
What can I fix in the second script or the first script to achieve what I want??
Switch out your subprocess.run(command_lst)
with Popen(command_lst, shell=True)
in each of your scripts and and loop through the command list like the example below to run the processes in parallel.
This is how you implement Popen to run processes in parallel using arbitrary commands for simplicity.
from subprocess import Popen
commands = ['ls -l', 'date', 'which python']
processes = [Popen(cmd, shell=True) for cmd in commands]