I'm trying to gzip a numpy array in Python 3.6.8.
If I run this snippet twice (different interpreter sessions), I get different output:
import gzip
import numpy
import base64
data = numpy.array([[1.0, 2.0, 3.0, 4.0], [5.0, 6.0, 7.0, 8.0], [9.0, 10.0, 11.0, 12.0], [13.0, 14.0, 15.0, 16.0]])
compressed = base64.standard_b64encode(gzip.compress(data.data, compresslevel=9))
print(compressed.decode('ascii'))
Example results (it's different every time):
H4sIAPjHiV4C/2NgAIEP9gwQ4AChOKC0AJQWgdISUFoGSitAaSUorQKl1aC0BpTWgtI6UFoPShs4AABmfqWAgAAAAA==
H4sIAPrHiV4C/2NgAIEP9gwQ4AChOKC0AJQWgdISUFoGSitAaSUorQKl1aC0BpTWgtI6UFoPShs4AABmfqWAgAAAAA==
^
Running it in a loop (so the same interpreter session),it gives the same result each time
for _ in range(1000):
assert compressed == base64.standard_b64encode(gzip.compress(data.data, compresslevel=9))
How can I get the same result each time? (Preferably without external libraries.)
Gzip uses some file information (inodes, timestamp, etc) when compressing (good discussion of that here). You are not using files per se but still you are doing it at different times. So that may have an effect (a look at Python's gzip wrapper would actually give a better insight but that is beyond me:)
So try using the mtime=0
parameter in gzip.compress(data.data, compresslevel=9)
if you have Python 3.8+, as
gzip.compress(data.data, compresslevel=9, mtime=0)
and if that does not work (e.g. older Python version), then you can use gzip.GzipFile
with the mtime
parameter, like this:
buf = io.BytesIO()
with GzipFile(fileobj=buf, mode='wb', compresslevel=compresslevel, mtime=0) as f:
f.write(data)
result = buf.getvalue()
For details, the documentation is here: