Search code examples
clzw

File compression and codes


I'm implementing a version of lzw. Let's say I start off with 10 bit codes and increase whenever I max out on codes. For example after 1024 codes, I'll need 11 bits to represent 1025. Issue is in expressing the shift.

How do I tell decode that I've changed the code size? I thought about using 00, but the program can't distinguish between 00 as an increment and 00 as just two instances of code zero.

Any suggestions?


Solution

  • You don't. You shift to a new size when the dictionary is full. The decoder's dictionary is built synchronized with the encoder's dictionary, so they'll both be full at the same time, and the decoder will shift to the new size exactly when the encoder does.

    The time you have to send a code to signal a change is when you've filled the dictionary completely -- you've used all of the largest codes available. In this case, you generally want to continue using the dictionary until/unless the compression rate starts to drop, then clear the dictionary and start over. You do need to put some marker in to tell when that happens. Typically, you reserve the single largest code for this purpose, but any code you don't use for any other purpose will work.

    Edit: as an aside, note that you normally want to start with codes exactly one bit larger than the codes for the input, so if you're compressing 8-bit bytes, you want to start with 9 bit codes.