Search code examples
pythonzipsha256

Reading zip file content for later compute sha256 checksum fails


I have a zip file which contains some regular files. This file is uploaded to a fileserver. Now I am trying to compute the sha256 checksum for the zip file, then write the checksum into a *.sha256sum file and upload to the fileserver as well.

Then when one downloads the zip file and the checksum file (.sha256sum) from the fileserver, he/she computes again the sha256 of the zip file and compare it with the one stored as text in the checksum file (.sha256sum) just downloaded.

When I try to compute the sha256 checksum of the zip file i get an error.

with open(filename) as f:
    data = f.read()
    hash_sha256 = hashlib.sha256(data).hexdigest()

The error is the following and it is thrown in line data = f.read():

in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 44: character maps to <undefined>

Solution

  • You must open the file in binary mode:

    with open(filename, 'rb') as f:
        data = f.read()
        hash_sha256 = hashlib.sha256(data).hexdigest()
    

    Per Reading and Writing files:

    Normally, files are opened in text mode, that means, you read and write strings from and to the file, which are encoded in a specific encoding.

    So, there's something going on under the hood to make it usable text, which you don't want.

    Appending a 'b' to the mode opens the file in binary mode. Binary mode data is read and written as bytes objects.