I'm trying to use awk
and GNU parallel
to filter the files based on the values in column 1 and column 2 and dump the result in a single .csv.gz file. Thanks to the answer here, I could manage to write myscript.sh
to do the job in parallel.
#!/bin/bash
doit() {
pigz -dc $1 | awk -F, '$1>0.5 && $2<1.5'
}
export -f doit
find $1 -name '*.csv.gz' | parallel doit | pigz > output.csv.gz
and then run the script in the terminal.
./myscript.sh /path/to/files
I'm wondering how I can pass 0.5 and 1.5 as arguments of myscript.sh
?
./myscript.sh /path/to/files 0.5 1.5
#!/bin/bash
doit() {
# $1 $2 $3 are arguments to doit
# '$1' and '$2' are variables in awk
pigz -dc $1 | awk -F, '$1>'$2' && $2<'$3
}
export -f doit
find $1 -name '*.csv.gz' | parallel doit {} $2 $3 | pigz > output.csv.gz
Call as:
paste <(seq 10 | shuf) <(seq 10 | shuf) | gzip > h.csv.gz
./myscript.sh . 5 6
zcat output.csv.gz