Search code examples
bashparallel-processinggnugnu-parallel

GNU parallel ignores piped commands


I'm ultimately trying to use parallel as a simple job queue manager, a la here. The idea seems to be to put the commands in a file, have tail read the file (using -f option so that it keeps looking for new lines), then pipe the output of tail into parallel. So I try

true > jobqueue; tail -n+0 -f jobqueue | parallel
echo echo {} ::: a b c >> jobqueue

but nothing happens. OK... to test things, I then just try

cat jobqueue | parallel

which gives

{} ::: a b c

Meanwhile

parallel echo {} ::: a b c

correctly outputs

a
b
c

So why does parallel ignore the parallel-ish syntax when it was fed from a file, but runs fine when it's given the command directly?

FWIW this is version 20160722, and since I don't have root access on the machine I had to build from source and install into my home directory.


Solution

  • So why does parallel ignore the parallel-ish syntax when it was fed from a file, but runs fine when it's given the command directly?

    Because that's what it is specified to do. What you're characterizing as "syntax" is defined in the manual as various command-line arguments and parts thereof. These seem mostly targeted at the case where the the command to parallelize is given on parallel's command line, and the program input consists of data to operate upon. This is the mode of operation of the xargs program, which was one of the inspirations for parallel.

    The fact is, you're making things more complicated than they need to be. When you run parallel without specifying a command on its command line, the commands you feed it via its input don't need the kind of input-line manipulation operations that parallel itself offers, and they can't, in general, take arguments any other way than on their own command line. When you run parallel in that mode, you just feed it the exact commands you want it to run:

    true > jobqueue; tail -n+0 -f jobqueue | parallel
    echo echo a b c >> jobqueue
    

    or

    true > jobqueue; tail -n+0 -f jobqueue | parallel
    echo echo a >> jobqueue
    echo echo b >> jobqueue
    echo echo c >> jobqueue
    

    , depending on what exactly you're after.

    As for nothing seeming to happen when you use tail -f to feed input to parallel, I'm inclined to think that parallel is waiting for more input. Its first read(s) does not return enough data to trigger it to dispatch any jobs, but the standard input is still open, so it has reason to think that more input will be coming (which indeed is appropriate). If you continue to feed it jobs then it will soon get enough input to start running them. When you're ready to shut down the queue you must kill the tail command so that parallel will know that it has reached the end of its input.