Search code examples
linuxparallel-processinggnugnu-parallel

What does "partial record" mean in the statement from GNU Parallel tutorial?


In this tutorial this statement is asserted:

One of the 4 instances got a single record, 2 instances got 2 full records each, and one instance got 1 full and 1 partial record.

But just before that statement, this seemingly conflicting statement is made:

The size of the chunk is not exactly 1 MB because GNU parallel only passes full lines - never half a line, thus the blocksize is only 1 MB on average.

Asking because I'm seeing what seems like a partial record sent to one of the programs receiving the record which breaks the program.

So is a partial (some amount of but not the full) record/line ever sent to the receiving stream/program or not?


Solution

  • You have found a bug in the documentation. Congrats.

    It should read block instead of record.

    The partial block is the last block of the input which is not 1 MB big. In the example this last "tail" block is 450 KBytes.

    num1000000 is 6888896 bytes = 6 MB + 450 KBytes.

    Everything is sent to the program - also the last block. You can convince yourself this is true by running:

    cat num1000000 | wc
    cat num1000000 | parallel --pipe --block 2M cat | wc
    cat num1000000 | parallel --pipe --block 1M cat | wc
    cat num1000000 | parallel --pipe --block 123456 cat | wc