I have a function that calculates the hash of all files in a directory. As part of this, each file is opened, chunks are read, and the hash is updated:
import hashlib, os
def get_dir_hash(directory, verbose=0):
hash = hashlib.sha256()
if not os.path.exists(directory):
return -1
try:
for root, dirs, files in os.walk(directory):
for names in files:
if verbose == 1:
print(f"Hashing {names}")
filepath = os.path.join(root, names)
try:
f1 = open(filepath, 'rb')
except:
# You can't open the file for some reason
if f1 is not None:
f1.close()
continue
while 1:
# Read file in as little chunks
buf = f1.read(4096)
if not buf:
break
hash.update(hashlib.sha256(str(buf).encode('utf-8')).hexdigest())
if f1 is not None:
f1.close()
except:
import traceback
# Print the stack traceback
traceback.print_exc()
return -2
return hash.hexdigest()
Note that I read a chunk of bytes, convert to string, and encode to utf-8 as suggested by other answers here in SO:
hash.update(hashlib.sha256(str(buf).encode('utf-8')).hexdigest())
However, I still get this error:
Traceback (most recent call last):
File "/home/user/Work/mmr6/mmr/util/dir_hash.py", line 33, in get_dir_hash
hash.update(hashlib.sha256(str(buf).encode('utf-8')).hexdigest())
TypeError: Unicode-objects must be encoded before hashing
What am I missing?
I found what you were missing :
When you write hash.update(hashlib.sha256(str(buf).encode('utf-8')).hexdigest())
the part with str(buf).encode('utf-8')
is a bit useless as you can write directly buf
(it's already a <bytes> object)
However hashlib.sha256(buf).hexdigest()
returns a str instance so that's where the error comes from.
The fixed version of the line would be
hash.update(hashlib.sha256(buf).hexdigest().encode("utf-8"))
I'm not 100% sure if that is what you wanted to do so feel free to tell me