Search code examples
deflate

will the length alphabet in the deflate algorithm always consist of codes > 256?


RFC1951 says "5 Bits: HLIT, # of Literal/Length codes - 257 (257 - 286)" and "HLIT + 257 code lengths for the literal/length alphabet, encoded using the code length Huffman code".

So the literals and lengths use the same alphabet and distances use their own distinct alphabet.

It makes sense for literals to be between 0-255. The end of block character is 256, which can't be represented by an 8-bit byte and since anything between 257 and 286 can't be represented in a single byte it doesn't seem unreasonable to conclude that any code in that range is a length code?

There's also the matter of the actual lengths that the length codes represent. In the example I'm looking it it looks like you can determine the actual length (5) from the length code (259) by subtracting 254 I could just take the length code and subtract 254 from it but I'm guessing length codes could exist for which that approach wouldn't work?


Solution

  • Yes. From the RFC:

                 Extra               Extra               Extra
            Code Bits Length(s) Code Bits Lengths   Code Bits Length(s)
            ---- ---- ------     ---- ---- -------   ---- ---- -------
             257   0     3       267   1   15,16     277   4   67-82
             258   0     4       268   1   17,18     278   4   83-98
             259   0     5       269   2   19-22     279   4   99-114
             260   0     6       270   2   23-26     280   4  115-130
             261   0     7       271   2   27-30     281   5  131-162
             262   0     8       272   2   31-34     282   5  163-194
             263   0     9       273   3   35-42     283   5  195-226
             264   0    10       274   3   43-50     284   5  227-257
             265   1  11,12      275   3   51-58     285   0    258
             266   1  13,14      276   3   59-66