I have used hashlib (which replaces md5 in Python 2.6/3.0), and it worked fine if I opened a file and put its content in the hashlib.md5()
function.
The problem is with very big files that their sizes could exceed the RAM size.
How can I get the MD5 hash of a file without loading the whole file into memory?
Break the file into 8192-byte chunks (or some other multiple of 128 bytes) and feed them to MD5 consecutively using update()
.
This takes advantage of the fact that MD5 has 128-byte digest blocks (8192 is 128×64). Since you're not reading the entire file into memory, this won't use much more than 8192 bytes of memory.
In Python 3.8+ you can do
import hashlib
with open("your_filename.txt", "rb") as f:
file_hash = hashlib.md5()
while chunk := f.read(8192):
file_hash.update(chunk)
print(file_hash.digest())
print(file_hash.hexdigest()) # to get a printable str instead of bytes