I have some subdirectories containing .csv.gz
files. Using awk
, I could manage to filter the files based on the values in column 1 and column 2 and dump the result in a single .csv.gz
file.
pigz -rdc /path/to/dir/ | awk -F, '{ if(($1>100) && ($2>100)) {print} }' | pigz > output.csv.gz
Thanks to pigz
, the front and end of the bash pipe benefit from parallel processing. I'm wondering how can I use GNU parallel tool for executing awk
jobs in parallel.
doit() {
pigz -dc "$1" | awk -F, '{ if(($1>100) && ($2>100)) {print} }'
}
export -f doit
find /path/to/dir -name '*.gz' | parallel doit | pigz > output.csv.gz