Search code examples
algorithmpseudocodecompressionlzw

LZW-decompressor algorithm


I'm having a hard time to understand the LZW algorithm. I'm examining the pseudocode supplied on wikipedia (http://en.wikipedia.org/w/index.php?title=Lempel-Ziv-Welch&oldid=245292660) and there's one part in the decompressor code that I don't understand:

      else if (k == currSizeDict)
          entry = w + w[0];

Can someone explain to me a scenario where this would happen?


Solution

  • This problem is explained very well here: https://www.cs.duke.edu/csed/curious/compression/lzw.html. The basic idea is that since LZW only requires the compressed string and a dictionary containing all elements of the alphabet (rather than a dictionary containing all encoded patterns), it's necessary to reconstruct all encodings of more complex patterns on the fly while decoding. This results in a situation where it's possible to run into encodings that aren't in the dictionary. Interestingly enough, as the link above points out, this can only happen when the encoded string begins and ends with the same character.