Search code examples
compressionbytelzw

How do I represent an LZW output in bytes?


I found an implementation of the LZW algorithm and I was wondering how can I represent its output, which is an int list, to a byte array.

I had tried with one byte but in case of long inputs the dictionary has more than 256 entries and thus I cannot convert.

Then I tried to add an extra byte to indicate how many bytes are used to store the values, but in this case I have to use 2 bytes for each value, which doesn't compress enough.

How can I optimize this?


Solution

  • As bits, not bytes. You just need a simple routine that writes an arbitrary number of bits to a stream of bytes. It simply keeps a one-byte buffer into which you put bits until you have eight bits. Then write than byte, clear the buffer, and start over. The process is reversed on the other side.

    When you get to the end, just write the last byte buffer if not empty with the remainder of the bits set to zero.

    You only need to figure out how many bits are required for each symbol at the current state of the compression. That same determination can be made on the other side when pulling bits from the stream.