Search code examples
compressionhuffman-codedeflate

decompressor for deflate comprssed-data with fixed Huffman codes


I want to write a decompressor for deflate compressed-data with fixed Huffman codes. form the specification:

     BTYPE specifies how the data are compressed, as follows:

        00 - no compression
        01 - compressed with fixed Huffman codes
        10 - compressed with dynamic Huffman codes
        11 - reserved (error)

     The only difference between the two compressed cases is how the
     Huffman codes for the literal/length and distance alphabets are
     defined.

I want the decompressor to decompress the data when BTYPE=01 I know that I have to start by decoding the Huffman code then decompress the lz77 but when BTYPE=01 the Huffman tree isn’t stored with the compressed data

so how do I decode the Huffman code without having the tree ??

edited:

so the Huffman code will be something like this :

0 110000
1 110001
2 110010
144 110010000
145 110010001
255 111111111
256 0
257 1
258 10
259 11
260 100
279 10111
280 11000000
287 11000111

what I don’t get is if I encountered the code 10 how I can distinguish between the value 2 from the Distance codes and the value 258 since the values 0-23 and 256-297 have the same code


Solution

  • The fixed Huffman codes are pre-defined. From RFC 1951:

    3.2.6. Compression with fixed Huffman codes (BTYPE=01)
    
         The Huffman codes for the two alphabets are fixed, and are not
         represented explicitly in the data.  The Huffman code lengths
         for the literal/length alphabet are:
    
                   Lit Value    Bits        Codes
                   ---------    ----        -----
                     0 - 143     8          00110000 through
                                            10111111
                   144 - 255     9          110010000 through
                                            111111111
                   256 - 279     7          0000000 through
                                            0010111
                   280 - 287     8          11000000 through
                                            11000111
    
         The code lengths are sufficient to generate the actual codes,
         as described above; we show the codes in the table for added
         clarity.  Literal/length values 286-287 will never actually
         occur in the compressed data, but participate in the code
         construction.
    
         Distance codes 0-31 are represented by (fixed-length) 5-bit
         codes, with possible additional bits as shown in the table
         shown in Paragraph 3.2.5, above.  Note that distance codes 30-
         31 will never actually occur in the compressed data.