Search code examples
pythonhashmd5checksum

Python Compare local and remote file MD5 Hash


I am trying to compare Local and remote file MD5 hash (the same file i copy/paste in my wamp "www" directory), but I don't understand why the "checksums" are not corresponding...

Here's the checksum code:

#-*- coding: utf-8 -*-

import hashlib
import requests

def md5Checksum(filePath,url):
    if url==None:
        with open(filePath, 'rb') as fh:
            m = hashlib.md5()
            while True:
                data = fh.read(8192)
                if not data:
                    break
                m.update(data)
            return m.hexdigest()
    else:
        r = requests.get(url, stream=True)
        m = hashlib.md5()
        for line in r.iter_lines():
            m.update(line)
        return m.hexdigest()

print "checksum_local :",md5Checksum("projectg715gb.pak",None)
print "checksum_remote :",md5Checksum(None,"http://testpangya.ddns.net/projectg715gb.pak")

And I am suprised to get this output :

checksum_local : 9d33806fdebcb91c3d7bfee7cfbe4ad7
checksum_remote : a13aaeb99eb020a0bc8247685c274e7d

The size of "projectg715gb.pak" is 14.7Mb

But if I try with a text file (size 1Kb) :

print "checksum_local :",md5Checksum("toto.txt",None)
print "checksum_remote :",md5Checksum(None,"http://testpangya.ddns.net/toto.txt")

Then it works oO I get this output :

checksum_local : f71dbe52628a3f83a77ab494817525c6
checksum_remote : f71dbe52628a3f83a77ab494817525c6

I am new to comparing MD5 hash so be nice please ^^' I might have done some big mistake, I don't understand why it doesn't work on big files, if someone could give me a hint, it would be super nice!

However thanks for reading and helping !


Solution

  • Ok looks like i found a solution so i will post it here :)

    First you need to edit an .htaccess file to the directory where your files are on your server.

    Content of the .htaccess file :

    ContentDigest On
    

    Now that you have set up this the server should send Content-MD5 data in HTTP header.

    It will result in something like :

    'Content-MD5': '7dVTxeHRktvI0Wh/7/4ZOQ=='
    

    Ok now let see Python part, so i modified my code to be able to compare this HTTP header data and local md5 Checksum.

    #-*- coding: utf-8 -*-
    
    import hashlib
    import requests
    import base64
    
    def md5Checksum(filePath,url):
        m = hashlib.md5()
        if url==None:
            with open(filePath, u'rb') as fh:
                m = hashlib.md5()
                while True:
                    data = fh.read(8192)
                    if not data:
                        break
                    m.update(data)
                #Get BASE 64 Local File md5
                return base64.b64encode(m.digest()).decode('ascii')#Encode MD5 digest to BASE 64
                
        else:
            #Get BASE 64 Remote File md5
            r = requests.head(url) #You read HTTP Header here
            return r.headers['Content-MD5'] #Take only Content-MD5 string
    
    def compare():
        local = md5Checksum("projectg502th.pak.zip",None)
        remote = md5Checksum(None,"http://127.0.0.1/md5/projectg502th.pak.zip")
    
        if local == remote :
            print("The soft don't download the file")
        else:
            print("The soft download the file")
    
    print ("checksum_local :",md5Checksum("projectg_ziinf.pak.zip",None))
    print ("checksum_remote : ",md5Checksum(None,"http://127.0.0.1/md5/projectg_ziinf.pak.zip"))
    
    compare()
    

    Output :

    checksum_local : 7dVTxeHRktvI0Wh/7/4ZOQ==
    checksum_remote : 7dVTxeHRktvI0Wh/7/4ZOQ==
    The soft don't download the file
    

    I hope this will help ;)