Search code examples
pythonzlibdeflate

zlib difference in size for level=0 between Python 3.9 and 3.10


In this code that uses zlib to encode some data, but with level=0 so it's not actually compressed:

import zlib

print('zlib.ZLIB_VERSION', zlib.ZLIB_VERSION)

total = 0
print('Total 1', total)
compress_obj = zlib.compressobj(level=0, memLevel=9, wbits=-zlib.MAX_WBITS)
total += len(compress_obj.compress(b'-' * 1000000))
print('Total 2', total)
total += len(compress_obj.flush())
print('Total 3', total)

Python 3.9.12 outputs

zlib.ZLIB_VERSION 1.2.12
Total 1 0
Total 2 983068
Total 3 1000080

but Python 3.10.6 (and Python 3.11.0) outputs

zlib.ZLIB_VERSION 1.2.13
Total 1 0
Total 2 1000080
Total 3 1000085

so both a different final size, and a different size along the way.

Why? And how can I get them to be identical? (I'm writing a library where I would prefer identical behaviour between Python versions)


Solution

  • zlib 1.2.12 and 1.2.13 behave identically in this regard. The Python library must be making different deflate() calls with different amounts of data, and possibly introducing a flush in the later version. You can look in the Python source code to find out.

    You should be able to force identical output if you feed smaller amounts of data to .compress() each time, e.g. less than 64K-1, and use .flush() after each. The output will be larger, but should be identical across versions.

    A quick look turned up this commit, which is likely the culprit.