Search code examples
linuxbashmultithreadingloopsgnu-parallel

Passing static variables to GNU Parallel


In a bash script I am trying to pass multiple distinct fastq files and several user-provided static variables to GNU Parallel. I can't hardcode the static variables because while they do not change within the script, they are set by the user and are variable between uses. I have tried a few different ways but get an error argument -b/--bin: expected one argument

Attempt 1:

binSize="10000"
outputDir="output"
errors="1"
minReads="10"

ls fastq_F* | parallel "python myscript.py -f split_fastq_F{} -b $binSize -o $outputDir -e $errors -p -t $minReads" 

Attempt 2:

    my_func() {
      python InDevOptimizations/DemultiplexUsingBarcodes_New_V1.py \
             -f split_fastq_F$1 \
             -b $binSize \
             -o $outputDir \
             -e $errors \
             -p \
             -t $minReads
    }
    export -f my_func

    ls fastq_F* | parallel my_func

It seems clear that I am not correctly passing the static variables... but I can't seem to grasp what the correct way to do this is.


Solution

  • Always try --dr when GNU Parallel does not do what you expect.

    binSize="10000"
    outputDir="output"
    errors="1"
    minReads="10"
    
    ls fastq_F* | parallel --dr "python myscript.py -f split_fastq_F{} -b $binSize -o $outputDir -e $errors -p -t $minReads"
    

    You are using " and not ' so the variables should be substituted by the shell before GNU Parallel starts.

    If the commands are run locally (i.e. not remote) you can use export VARIABLE.

    If run on remote servers, use env_parallel:

    env_parallel --session
    
    alias myecho='echo aliases'
    env_parallel -S server myecho ::: work
    myfunc() { echo functions $*; }
    env_parallel -S server myfunc ::: work
    myvar=variables
    env_parallel -S server echo '$myvar' ::: work
    myarray=(arrays work, too)
    env_parallel -k -S server echo '${myarray[{}]}' ::: 0 1 2
    
    env_parallel --end-session