Search code examples
pythonbinarymd5hashlib

Varying length of MD5 checksum in python


I use the following code to compute the binary representation of MD5 hashcode.

MD5 is always 128 bytes, and bin returns a string starting with "0b". Therefore, the length of md5_bin must always be 130, but when I run the program, it varies between 128 and 130, on different values of random_str.

md5_bin = bin(int(hashlib.md5(random_str).hexdigest(),16))`
print len(md5_bin)

Solution

  • Sure, MD5 is always 128 bytes, but sometimes the first byte is a 0, and occasionally the second byte is too.

    Think of it this way: the decimal string '15' and the decimal '0015' are both the same number 15. When you ask Python to convert the int 15 to a string, you're going to get '15', not '0015'. It has no way of knowing that you wanted 4 digits instead of 2:

    >>> n = int('0015')
    >>> str(n)
    '15'
    

    And it's the same with bin. It has no way of knowing that you wanted 128 bits instead of 126. You gave it a number with 126 bits, so it gives you 126 binary digits.

    But you can tell it you want that, e.g., with a format spec:

    bits = format(md5_bin, '0128b')
    

    … or, equivalently:

    bits = '{:0128b}'.format(md5_bin)
    

    If you want the 0b prefix, you can add that:

    bits = format(md5_bin, '#0128b')
    bits = '{md5_bin:#0128b}'.format(md5_bin)
    bits = '0b{md5_bin:0128b}'.format(md5_bin)