I have a Python script job.py
which accepts command-line arguments. The script uses the Python package subprocess
to run some external programs. Both the script and the external programs are sequential (i.e. no MPI, openMP, etc.). I want to run this script 4 times, each time with different command-line arguments. My processor has 4 cores and therefore I would like to run all 4 instances simultaneously. If I open 4 terminals and run each instance of the script in the separate terminals it works perfectly and I get exactly what I want.
Now I want to make it easier for myself to launch the 4 instances such that I can do all of this with a single command from a single terminal. For this I use a bash script batch.sh
:
python job.py 4 0 &
python job.py 4 1 &
python job.py 4 2 &
python job.py 4 3 &
This does not work. It turns out that subprocess
is the culprit here. All the Python code runs perfectly until it hits subprocess.call
after which I get:
[1]+ Stopped python job.py 4 0
So how I see it, is that I am trying to run job.py
in the background and job.py
itself tries to run something else in the background via subprocess
. This apparently does not work for reasons I do not understand.
Is there a way to run job.py
multiple times without requiring multiple terminals?
EDIT #1
On recommendation I tried the multiprocessing
, thread
and threading
packages. In the best case just one instance ran properly. I tried an ugly workaround which does work. I made a bash script which launches each instance in a new terminal:
konsole -e python job.py 4 0
konsole -e python job.py 4 1
konsole -e python job.py 4 2
konsole -e python job.py 4 3
EDIT #2
Here is the actual function that uses subprocess.call
(note: subprocess
is imported as sp
).
def run_case(path):
case = path['case']
os.chdir(case)
cmd = '{foam}; {solver} >log.{solver} 2>&1'.format(foam=CONFIG['FOAM'],
solver=CONFIG['SOLVER'])
sp.call(['/bin/bash', '-i', '-c', cmd])
Let me fill in the blank spots:
CONFIG
is a globally defined dictionary.CONFIG['FOAM'] = 'of40'
and this is an alias in my .bashrc used to source a file belonging to the binary I'm running.CONFIG['SOLVER'] = 'simpleFoam'
and this is the binary I'm running.EDIT #3
I finally got it to work with this
def run_case():
case = CONFIG['PATH']['case']
os.chdir(case)
cmd = 'source {foam}; {solver} >log.simpleFoam 2>&1'.format(foam=CONFIG['FOAM'],
solver=CONFIG['SOLVER'])
sp.call([cmd], shell=True, executable='/bin/bash')
The solution was to set both shell=True
and executable='/bin/bash'
instead of including /bin/bash
in the actual command-line to pass to the shell. NOTE: foam
is now a path to a file instead of an alias.
You can parallelize from within Python:
import multiprocessing
import subprocess
def run_job(spec):
...
if spec ...:
subprocess.call(...)
def run_all_jobs(specs):
pool = multiprocessing.Pool()
pool.map(run_job, specs)
It has the advantage of letting you monitor/log/debug the parallelization.