I have a number of jobs. Typically I start the jobs manually by opening a number of terminal windows, and in each terminal window setting certain environment variables to different values and then invoking my programs manually. For example:
Terminal 1 commands:
export OMP_NUM_THREADS=4
./run_application1.sh
Terminal 2 commands:
export OMP_NUM_THREADS=10
./run_application2.sh
.
.
.
Terminal 8 commands:
export OMP_NUM_THREADS=5
./run_application8.sh
As you can see in each terminal I invoke some application (run_applicationX.sh) and each uses a different value for OMP_NUM_THREADS. Now I want to write a script (bash or python, whichever is most suitable) that generalizes this. In other words I can pass a jobs number (say --jobs=2 for example) as well as an array A[] that is equal in length to --jobs, as well as a list of N applications (run_application1.sh, ...., run_applicationN.sh). Then I want to execute all the N applications, where at each instant at most --jobs applications are running in parallel. Furthermore the each application is supposed to use the value in A[#current job number] for its environment variable. In other words I am looking for something like this:
parfor i=1...N
export OMP_NUM_THREADS=${A[JOB NUMBER]}
./run_application{i}.sh
where at most --jobs applications are ever run in parallel. What is the best way to do this? I know that the GNU parallel tool could be used to do this, but I am not sure how I could assign a different set of environment variables based on the current jobs number. Notice that the job number is an integer between 1 and --jobs, which guarantees that the same set of environment variable values are never used simultaneously. Thanks
It is unclear to me what you want, but lets see if we together can build it.
app1() {
export OMP_NUM_THREADS=$1
sleep 1
echo app1 $OMP_NUM_THREADS
}
app2() {
export OMP_NUM_THREADS=$1
sleep 1
echo app2 $OMP_NUM_THREADS
}
app3() {
export OMP_NUM_THREADS=$1
sleep 1
echo app3 $OMP_NUM_THREADS
}
app4() {
export OMP_NUM_THREADS=$1
sleep 1
echo app4 $OMP_NUM_THREADS
}
export -f app1 app2 app3 app4
parallel app{1} {2} ::: 1 2 3 4 :::+ 2 3 5 7
Or compute OMP_NUM_THREADS based on job number using Perl code
seq 4 | parallel app{} '{= $_= seq()*seq()+1 =}'
To guarantee that not two jobs use the same value (often used for CUDA_VISIBLE_DEVICES), you can use the job slot number:
# 0..3
seq 10 | parallel -j 4 'CUDA_VISIBLE_DEVICES={= $_=slot()-1 =} app{}'
Or:
# 1..4
seq 10 | parallel -j 4 'app{} {%}'