Search code examples
compressiondeflate

Does deflate compress tightly interleaved compressible and non-compressible data well?


Let's say I have a repeatable pattern, like random data of 4 random bytes, 4 predictable bytes, 4 new random bytes, same 4 predictable bytes, and so on. Is this something deflate can compress well?

Are 4 bytes too short for it to compress well?
Does deflate have any built-in support for interleaved compressible/non-compressible data like this?
Does any other common compression format handle this pattern better?


Solution

  • You'd have to define "well", but yes, deflate can and will take advantage of repeating strings as short as three bytes.

    There's nothing like just trying it. I generated 100,000 sets of four random bytes followed by four zeros, so 800,000 bytes total. gzip compressed it to about 500,000 bytes. That's not bad, as it certainly couldn't do better than 400,000, which is the size of the random data.