Search code examples
bashgnu-parallel

Pass bash variables to a list of commands stored as a separate "commands.txt"


I have the following PBS script:

#!/bin/bash -l
#PBS -l walltime=12:00:00,nodes=1:ppn=24,pmem=2580mb

((start=24))
((n_jobs_procimg=8))

cd $PBS_O_WORKDIR
conda activate msi_sip_37

module load parallel
parallel -u -j 3 < commands.txt
wait

And the contents of commands.txt are:

python runs/process_img.py --n_jobs $n_jobs_procimg --msi_test True --idx_min $start+0 --idx_max $start+8
python runs/process_img.py --n_jobs $n_jobs_procimg --msi_test True --idx_min $start+8 --idx_max $start+16
python runs/process_img.py --n_jobs $n_jobs_procimg --msi_test True --idx_min $start+16 --idx_max $start+24

I [incorrectly] expect that $start and $n_jobs_procimg should be available to the commands in commands.txt, but when I run this job, I get the following error for each command in commands.txt:

usage: process_img.py [-h] [-n N_JOBS] [-m MSI_TEST] [-i IDX_MIN] [-d IDX_MAX]
process_img.py: error: argument -n/--n_jobs: expected one argument

How do I modify the parallel command in the PBS script so that $start and $n_jobs_procimg are passed to commands.txt?

In this case, $start should be an integer equal to 24 and $n_jobs_procimg should be an integer equal to 8.

It's not relevant to my question, but for context the process_img.py script uses a multiprocessing pool that takes the number of processing cores and as an argument (--n_jobs), and I want to have control over that based on the total number of processing cores I have available.


Solution

  • export the variables:

    #!/bin/bash -l
    #PBS -l walltime=12:00:00,nodes=1:ppn=24,pmem=2580mb
    
    start=24
    n_jobs_procimg=8
    export start
    export n_jobs_procimg
    
    cd $PBS_O_WORKDIR
    conda activate msi_sip_37
    
    module load parallel
    parallel -u -j 3 < commands.txt