I can't figure out what exactly is the streaming mode offered by modern compression/decompression algorithms (eg ZStandard or LZ4) and how I can exploit it.
As an example, suppose I have 4x16KB
file. I can (individually) compress each file and obtain 4xDifferentCompressedLength
files. However I could compress all 4 files together (sending them sequentially, right?) using streaming mode and obtain 1xCompressedLength
and expect the compression ratio to be better.
Can I decompress (say) only the 3rd file without decompressing all the previous files? Do streaming mode introduce dependency between the files I appended?
Yes, streaming introduce dependency between files.
In your example, decoding file3
would require to decode first file1
then file2
.
Note also that data will appear as appended, with no specific marker between files. So one would need a way to know where each file starts and ends if it's important. Sometimes it's implicit (ex : fixed 16KB size), sometimes it can be deducted from data itself (specific end-of-mark), sometimes it needs additional metadata. It all depends on the application.
You are correct that the compression ratio of C(4xFiles)
is expected to be better than 4xC(File)
, especially if the 4 files are somewhat related (for example if they all are text files).