Search code examples
pythongzipmd5sum

Python gzip omit the original filename and timestamp


Folks, I am generating an md5sum of a gzip file. Technically, each time its compressing the same file, but the resulting md5sum is different. How do I tell it to use the -n flag to omit the original filename and timestamp?

f_in = open(tmpFile, 'rb')
f_out = gzip.open(uploadFile, 'wb')
f_out.writelines(f_in)
f_out.close()
f_in.close()

Thanks!


Solution

  • The GzipFile class allows you to explicitly provide the filename and the timestamp for the header.

    E.g.:

    #!/usr/bin/python
    import sys
    import gzip
    
    f = open('out.gz', 'wb')
    gz = gzip.GzipFile('', 'wb', 9, f, 0.)
    gz.write(str.encode('this is a test'))
    gz.close()
    f.close()
    

    This will produce a gzip header with no filename and a modification time of zero, meaning no modification time per the RFC 1952 standard for gzip.