Search code examples
gnu-parallel

GNU Parallel output to stdout using --round-robin


I'm trying to use GNU Parallel to help me process some remote files that I don't want to save locally.

My command looks somewhat like that:

python list_files.py | \
  parallel -j5 'aws s3 cp s3://s3-bucket/{} -' | \
    parallel -j5 --round --pipe -l 5000 "python process_and_print.py"

process_and_print.py prints output for some input lines, but that output doesn't get to stdout immediately like I expected, instead I only see the output after the process is finished. If I remove the --round parameter is all works as expected.

Where does all that data get saved? Do I have a way to print it to stdout, line by line, without buffering?


Solution

  • Where does all that data get saved?

    All buffered output from GNU Parallel is buffered in temporary files in $TMPDIR / --tmpdir which defaults to /tmp. You cannot see the files, as they are immediately removed (but kept open) to avoid you having to clean up, if GNU Parallel is killed.

    Do I have a way to print it to stdout, line by line,

    --line-buffer

    without buffering?

    -u disables buffering all together, but then you cannot guarantee line-by-line.

    --line-buffer buffers a full line in memory from version 20170822 and thus does not buffer in /tmp.