Search code examples
bashawkgnu-parallel

Pass parallel variable "{}" as awk variable


I want to extract in the same order all lines in ids.ped according to a list of words (second column of list_of_words) preserving the same order.

ids.ped file:

2425 NA19901 0
2472 NA20291 0
2476 NA20298 0
1328 NA06989 0
...

I want to use awk and parallel for this task.

I tried the following:

cut -f2 list_of_words |
    parallel -j35 --keep-order \
    awk -v id={} 'BEGIN{FS=" "}{if($2 == id){print $2,$3}}' ids.ped

However, I get the error

/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: `awk -v id= BEGIN{FS=" "}{if($2 == id){print $2,$3}} ids.ped'

It seems I cannot pass {} this way.

Notes:

  • ids.ped is big, that's way I want to parallelize
  • I want to use awk since I want to extract lines according to second column in ids.ped

For some reason I do not understand why grep -w extracts some lines twice, that is one reason I would rather use awk.

Any other answer to solve this problem efficiently is welcome. Thanks.


Solution

  • I wasn't able to reproduce your parameter passing problem (do you have empty columns at the beginning of the file?) but I did get the syntax error due to how parallel its interprets arguments.

    /opt/local/bin/bash: -c: line 0: syntax error near unexpected token `('
    /opt/local/bin/bash: -c: line 0: `awk -v id=NA20291 BEGIN{FS=" "}{if($2 == id){print $2,$3}} foo.txt'
    

    You've got three choices to fix the problem; you can add the -q option to parallel to "protect against evaluation by the subshell":

    cut -f2 list_of_words |
        parallel -j35 -q --keep-order \
        awk -v id="{}" 'BEGIN{FS=" "}{if($2 == id){print $2,$3}}' ids.ped
    

    You can move the awk code to a separate file; the rest of the command is simple enough that it doesn't need to be escaped:

    cut -f2 list_of_words |
        parallel -j35 --keep-order awk -v id={} -f foo.awk ids.ped
    

    Contents of foo.awk:

    #!/usr/bin/awk
    BEGIN {
        FS=" "
    }
    
    {
        if($2 == id){
            print $2,$3
        }
    }
    

    Or, you can figure out how to escape the command. The manual linked above says "most people will never need more quoting than putting '\' in front of the special characters."

    cut -f2 list_of_words |
        parallel -j35 --keep-order \
        awk -v id="{}" \''BEGIN{FS=" "}{if($2 == id){print $2,$3}}'\' ids.ped