Search code examples
bashawkparallel-processinggnu-parallel

Passing Arguments to GNU parallel


I'm trying to use awk and GNU parallel to filter the files based on the values in column 1 and column 2 and dump the result in a single .csv.gz file. Thanks to the answer here, I could manage to write myscript.sh to do the job in parallel.

#!/bin/bash

doit() {
    pigz -dc $1 | awk -F, '$1>0.5 && $2<1.5'
}
export -f doit


find $1 -name '*.csv.gz' | parallel doit | pigz > output.csv.gz

and then run the script in the terminal.

./myscript.sh /path/to/files

I'm wondering how I can pass 0.5 and 1.5 as arguments of myscript.sh?

./myscript.sh /path/to/files 0.5 1.5

Solution

  • #!/bin/bash
    
    doit() {
        # $1 $2 $3 are arguments to doit
        # '$1' and '$2' are variables in awk
        pigz -dc $1 | awk -F, '$1>'$2' && $2<'$3
    }
    export -f doit
    
    
    find $1 -name '*.csv.gz' | parallel doit {} $2 $3 | pigz > output.csv.gz
    

    Call as:

    paste <(seq 10 | shuf) <(seq 10 | shuf) | gzip > h.csv.gz
    ./myscript.sh . 5 6
    zcat output.csv.gz