Search code examples
zlibhuffman-code

in zlib what happen when the Huffman code lengths for the alphabets exceed maximum code length(15)?


in https://www.rfc-editor.org/rfc/rfc1951

Note that in the "deflate" format, the Huffman codes for the
     various alphabets must not exceed certain maximum code lengths.

the max code length definition is 15.

what happens when the Huffman code length exceed 15?

from https://cs.stackexchange.com/questions/75542/maximum-size-of-huffman-codes-for-an-alphabet-containing-256-letters The maximum possible code size for a 256 symbol alphabet is 256 bits. Consider the case when the most frequent symbol has frequency 1/2, the next most frequent symbol has frequency 1/4, then 1/8

so in literal/length alphabet the max Huffman code length is 285-1=284 but in zlib the max code length is 15.

  1. why 15 was choose as the max code length?
  2. will zlib fail is code length exceed 15?

Solution

    1. We don't know for sure why Phil Katz chose 15, but it was likely to facilitate a fast implementation in a 16-bit processor.

    2. No, zlib will not fail. It happens all the time. The zlib implementation applies the normal Huffman algorithm, after which if the longest code is longer than 15 bits, it proceeds to modify the codes to force them all to 15 bits or less.

    The maximum possible Huffman code size for a 256-symbol alphabet is 255 bits, not 256. The last two symbols have the same length, 255.

    You will not have a case though where the input would result in a code that long. You would need a minimum input size of about 3x1053 symbols in order to arrive at a 255-bit-long code. I'm thinking you don't have enough memory for that.

    In any case, zlib normally limits a deflate block to 16384 symbols. For that number, the maximum Huffman code length is 19. That comes from a Fibonacci-like (Lucas) sequence of probabilities, not your powers of two. (Left as an exercise for the reader.)