Search code examples
zlibdeflate

Does the Huffman Code Tree in Deflate algorithm have to be a complete tree?


Does the Huffman code tree in Deflate algorithm have to be a complete tree? By complete tree, I mean that each leaf node must always represent one symbol. In other word, the last symbol with longest code will be assigned with all ones.

Take an extreme case for example: given 286 symbols, each symbol is encoded with 15-bit code - which is possible in general huffman tree coding. In this case however, there are 2^15 - 286 leaf nodes are not assigned/used. Is it allowed in Deflate? I have an impression that this is not allowed in Deflate and the tree must be a complete one. Is that true?


Solution

  • Except for one case, the Huffman codes described in a dynamic block in a valid deflate stream must be complete. Those are the bit lengths code, the literal/length code, and the distance code.

    The one exception is that if there is only one distance symbol used, it is coded with one bit (a zero) as opposed to zero bits, leaving one code unused (the single bit being a one).