Search code examples
bashparallel-processinggnugnu-parallel

How to use GNU parallel in bash while reading variables from stdin?


I'm trying to adapt the following lines of code for use with GNU parallel:

for ID in $(cut -f1 markers.tsv);
    do echo $ID;
    FAA=${ID}.faa.gz
    zcat ${FAA} | muscle -out ${ID}.msa
    done

Preferably without creating an intermediate script.

However, the examples I'm seeing here do not show where I can use my ${ID} argument.

This could be one a one liner:

for ID in $(cut -f1 markers.tsv);
    do echo $ID && FAA=${ID}.faa.gz && zcat ${FAA} | muscle -out ${ID}.msa
    done

I'm trying this but it appears to not be running the jobs simultaneously:

cut -f1 markers.tsv | parallel -j 16 -I @ 'echo "@" && FAA="@.faa.gz" && zcat $FAA | muscle -out @.msa'

Can someone help me adapt this using 16 jobs correctly?

Example markers.tsv

PF00709.21\t1\ta
PF00406.22\t2\tb
PF01808.18\t3\tc

Solution

  • Due to a bug in GNU Parallel an input line cannot be longer that the maximal command line length.

    cut -f1 markers.tsv |
      parallel -j16 'echo {} && zcat {}.faa.gz | muscle -out {}.msa'