Search code examples
pythonpython-3.xsha1gnupghashlib

Python3 hashing gives different result


I try to calculate sha1 for encrypted file (file.gpg) using Python3 code.

I test two func.

import hashlib
import gnupg

def sha1sum(filename):
    h  = hashlib.sha1()
    b  = bytearray(128*1024)
    mv = memoryview(b)
    with open(filename, 'rb', buffering=0) as f:
        for n in iter(lambda : f.readinto(mv), 0):
            h.update(mv[:n])
    return h.hexdigest()


def sha1_checksum(filename, block_size=65536):
    sha1 = hashlib.sha1()
    with open(filename, 'rb') as f:
        for block in iter(lambda: f.read(block_size), b''):
            sha1.update(block)
    return sha1.hexdigest()

original = open('file.bin', 'rb')

gpg = gnupg.GPG()
gpg.encoding = 'utf-8'
encrypt = gpg.encrypt_file(original, 
                           recipients=None, 
                           passphrase=password,
                           symmetric='AES256', 
                           output=file)

sum = sha1sum(file)
sum = sha1_checksum(file)

First start of the script

697cee13eb4c91f41922472d8768fad076c72166
697cee13eb4c91f41922472d8768fad076c72166

Second start of the script

a95593f0d8ce274492862b58108a20700ecf9d2b
a95593f0d8ce274492862b58108a20700ecf9d2b

Does sha1sum() or sha1_checksum() wrong?

Or file encryption gives different file.gpg ?


Solution

  • This is not a problem of Python, or even gpg.

    The reason the hash changes is that gpg asymmetric encryption is non-deterministic, or so-called probabilistic.

    Quote from wiki page Probabilistic encryption

    Probabilistic encryption is the use of randomness in an encryption algorithm, so that when encrypting the same message several times it will, in general, yield different ciphertexts. The term "probabilistic encryption" is typically used in reference to public key encryption algorithms, however various symmetric key encryption algorithms achieve a similar property (e.g., block ciphers when used in a chaining mode such as CBC).