Search code examples
pythonpython-2.7unchashlib

Python hashlib MD5 digest of any UNC file always yields same hash


The below code shows that three files which are on a UNC share hosted on another machine have the same hash. It also shows that local files have different hashes. Why would this be? I feel that there is some UNC consideration that I don't know about.

Python 2.7.5 (default, May 15 2013, 22:44:16) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import hashlib
>>> fn_a = '\\\\some.host.com\\Shares\\folder1\\file_a'
>>> fn_b = '\\\\some.host.com\\Shares\\folder1\\file_b'
>>> fn_c = '\\\\some.host.com\\Shares\\folder2\\file_c'
>>> fn_d = 'E:\\file_d'
>>> fn_e = 'E:\\file_e'
>>> fn_f = 'E:\\folder3\\file_f'
>>> f_a = open(fn_a, 'r')
>>> f_b = open(fn_b, 'r')
>>> f_c = open(fn_c, 'r')
>>> f_d = open(fn_d, 'r')
>>> f_e = open(fn_e, 'r')
>>> f_f = open(fn_f, 'r')
>>> hashlib.md5(f_a.read()).hexdigest()
'54637fdcade4b7fd7cabd45d51ab8311'
>>> hashlib.md5(f_b.read()).hexdigest()
'54637fdcade4b7fd7cabd45d51ab8311'
>>> hashlib.md5(f_c.read()).hexdigest()
'54637fdcade4b7fd7cabd45d51ab8311'
>>> hashlib.md5(f_d.read()).hexdigest()
'd2bf541b1a9d2fc1a985f65590476856'
>>> hashlib.md5(f_e.read()).hexdigest()
'e84be3c598a098f1af9f2a9d6f806ed5'
>>> hashlib.md5(f_f.read()).hexdigest()
'e11f04ed3534cc4784df3875defa0236'

EDIT: To further investigate the problem, I also tested using a file from another host. It appears that changing the host will change the result.

>>> fn_h = '\\\\host\\share\\file'
>>> f_h = open(fn_h, 'r')
>>> hashlib.md5(f_h.read()).hexdigest()
'f23ee2dbbb0040bf2586cfab29a03634'

...but then I tried a different file on the new host, and got a new result!

>>> fn_i = '\\\\host\\share\\different_file'
>>> f_i = open(fn_i, 'r')
>>> hashlib.md5(f_i.read()).hexdigest()
'a8ad771db7af8c96f635bcda8fdce961'

So, now I'm really confused. Could it have something to do with the fact that the original host is a \\host.com format and the new host is a \\host format?


Solution

  • I did some additional research based on the comments and answers everyone provided. I decided I needed to study permutations of these two features of the code:

    1. A raw string literal is used for the path name, i.e. whether or not:
      A. The file path string is raw with single backslashes in the path, vs.
      B. The file path string is not raw with double backslashes in the path

      (FYI to those who don't know, a raw string is one which is proceeded by an "r" like this: r'This is a raw string')

    2. The open function mode is r or rb.
      (FYI again to those who don't know, the b in rb mode indicates to read the file as binary.)

    The results demonstrated:

    • The string literal / backslashes make no difference in whether or not the hashes of different files are different
    • My error was not opening the file in binary mode. When using rb mode in open, I got different results.

    Yay! And thanks for the help.