Search code examples
ccompressionlz4

Explain lz4 double buffer example


In lz4 examples, there is one named doublebuffer "https://github.com/Cyan4973/lz4/blob/master/examples/blockStreaming_doubleBuffer.c". This uses a char inpBuf[2][BLOCK_BYTES] during a read-compress loop and uses inpBuf[0][], inpBuf[1][] alternately.

I cannot understand the benefit of this. Why not use a single buffer? What am I missing?


Solution

  • The benefit of double buffer is better compression ratio. This is only useful if you don't have enough memory to fit your entire object/file into memory as a single block.

    This is not obvious. So it deserves a comparison to check that.

    You can make this exercise if you want to experience it more directly :

    1) Compress a file, by cutting it into blocks of 4 KB, and compressing each block independently. Note the final compression ratio.

    2) Compress the same file, but using a double-buffer with 2 blocks of 4 KB, applying the same methodology as the one provided into example. Note the final compression ratio, it should be greatly improved.

    3) For a more suitable comparison, redo test 1, but using 8 KB independant blocks this time, so that both implementation 2 & 3 use the same amount of memory. You should, once again, notice that implementation 2 offers better compression ratio.

    4) The ratio difference is even more pronounced if using the "HC" version of LZ4, rather than the "fast" one.

    So, to summarize :

    • If you have enough memory to contain your whole object/file into memory, you don't need to use this method
    • If you have to cut your input data into smaller blocks, you can select to experience a better compression ratio by using double-buffer, rather than independent blocks. Downside is that it is more complex to setup.