Search code examples
bashawkparallel-processinggnu-parallel

Parallelizing awk script


I’m trying to parallelize the following script:

$ awk -F , '$3 > 25 && $3 < 26' data_temp.csv | head

... for which I am getting the desired output. (Same for cat data_temp.csv | awk -F , '$3 > 25 && $3 < 26' | head.) My attempts so far:

$ parallel "awk -F , '$3 > 25 && $3 < 26' data_temp.csv" | head
parallel: Warning: Input is read from the terminal.
parallel: Warning: Only experts do this on purpose. Press CTRL-D to exit.

$ cat data_temp.csv | parallel --pipe awk -F , \'$3 > 25 && $3 < 26\' | awk -F , '$3 > 25 && $3 < 26' | head
sh: -c: line 0: unexpected EOF while looking for matching `''
sh: -c: line 1: syntax error: unexpected end of file
# repeated for what looks like every line

Solution

  • Untested:

    cat data_temp.csv |
      parallel -k -q --block 100M --pipe awk -F , '$3 > 25 && $3 < 26' |
      head
    parallel -k -q --block 100M --pipepart -a data_temp.csv awk -F , '$3 > 25 && $3 < 26' |
      head