Search code examples
compression7ziplzmalossless-compression

Use LZMA to codificate a stream of information


The professor gave me a research paper that shows a way to efficiently compress some kind of data. It's not worth to eplain the full algorithm since the question is not about that, I just introduce a little example that should allow you to undestand what the real question is about.

Our compression algorithm have is own dictionary which is a table (no matter how it is calculated, just assume that both compressor and decompressor have it), each table row has a string. The compressor in order to compress a message will open it and start from begining, it will search for a match in the dictionary and eventually send a MATCH message with the row id, if nothing is found then a SET message with the message to set is sent. Note that MATCH do not really have to be complete match, they can be followed by many MISSMATCH message each containing the byte offset wrong and the correct byte.

So for example the compressor might want to encode:

Now, in the paper they say that they entropy encode this "stream" of data using LZMA and they assume it's a trivial thing to do without giving further details.

I've searched online but I didn't come up with anything. Do you have any idea on how this last step could be done? Do you have any reference?


Solution

  • There is a stream compression algorithm with preset dictionary using LZMA as part of this open-source project: Zip-Ada . The preset dictionary is called there "training data".