Search code examples
zipcompressiongzipdeflate

When to form a new DEFLATE block?


When compressing a file or directory into a zip file using DEFLATE, when should a new DEFLATE block be formed? Furthermore, since the maximum code length is 15 bits in DEFLATE, should a new block be formed whenever the Huffman tree exceeds a depth of 15? Thanks!


Solution

    1. Whenever you like, but not too often.
    2. No. You can squash the Huffman tree.

    zlib emits a deflate block once a selected number of literals + length/distance pairs have been generated. By default, that number is 16383. It can be changed as part of memory usage option. At the end, the last block has whatever remains.

    zopfli tries to be more intelligent by making large blocks and splitting them so long as the compression ratio goes up, stopping when the the next split would make the compression ratio go down.

    You don't want deflate blocks to be too small, because then the size of the dynamic header describing the codes used in the block will become a significant factor in the size, reducing the compression ratio. You don't want the blocks to be too large, because then the codes, fixed for the duration of the block, will not be able to adapt to local statistical variations in the data being compressed.

    As for the maximum depth, zlib and other deflators will happily make blocks for which a code has a depth greater than 15 by the normal Huffman algorithm. They will then squash the code down to make the depth 15.