Search code examples
compressionimplementationdecodinghuffman-codelookup-tables

Huffman table lookup at end of stream


I just had a look in efficient table based Huffman decoding algorithms. A short description can be found here: https://stackoverflow.com/a/29902248/1421332 (using two-level tables). Main goal is to reduce decoding time by spending a limited amount of memory. I think I understood how these algorithms work.

What I don't understand: Every real bit stream is finite. This means at some point (at the end of the stream) it will not have 9bit left (which are needed as index for the LUT in case of a 9bit LUT) but possibly only a few bits for a shorter code. Using these bits as an index will not work. I could check the number of remaining bits every time I get bits from the stream and append enough 0-bits to always get a full index. However this adds a comparison operation to each code look-up which is a waste since it only has an effect for the last few bits of a potentially large stream.

Is there a more efficient way to handle that issue?


Solution

  • "Using these bits as an index will not work."

    Actually, yes it will.

    For inflate the bits start with the least significant. Then your bit accumulator has the available bits in at the bottom, and zeros or junk above that. It does not matter what those bits are. Use that in your look up table. If the resulting decoded code length is less than or equal to the number of available bits, then process that code, discard the bits from the accumulator, and repeat.

    For a correct stream, you will end up with zero bits in the accumulator. If you get a decoded code whose length is greater than the number of available bits left, then you have an invalid stream. Those last available bits from your stream have been determined to be the prefix of a longer code.