Search code examples
compressionlossless-compression

Streaming mode vs block mode


I can't figure out what exactly is the streaming mode offered by modern compression/decompression algorithms (eg ZStandard or LZ4) and how I can exploit it.

As an example, suppose I have 4x16KB file. I can (individually) compress each file and obtain 4xDifferentCompressedLength files. However I could compress all 4 files together (sending them sequentially, right?) using streaming mode and obtain 1xCompressedLength and expect the compression ratio to be better.

Can I decompress (say) only the 3rd file without decompressing all the previous files? Do streaming mode introduce dependency between the files I appended?


Solution

  • Yes, streaming introduce dependency between files. In your example, decoding file3 would require to decode first file1 then file2.

    Note also that data will appear as appended, with no specific marker between files. So one would need a way to know where each file starts and ends if it's important. Sometimes it's implicit (ex : fixed 16KB size), sometimes it can be deducted from data itself (specific end-of-mark), sometimes it needs additional metadata. It all depends on the application.

    You are correct that the compression ratio of C(4xFiles) is expected to be better than 4xC(File), especially if the 4 files are somewhat related (for example if they all are text files).