Search code examples
pythonbytemd5hashlib

hashlib.md5 returns a weird object and indexing is also strange


So I wanted to get to know how hashlib.md5 works and produced the following code:

import hashlib

a = b'yolo'

h = hashlib.md5(a).digest()
b = h[6:10]

print(h)
print(b)

Don't mind the fact that I used "yolo" as a string. This is just for testing.

Now when running this code, it produces

b'O\xde\xd1FG6\xe7xe\xdf#,\xbc\xb4\xcd\x19'
b'\xe7xe\xdf'

which quite frankly seems to be off. First of all, I expected 4 bytes (bytes 6-9 both included) to come out in the second line and the first part (the\xe7xe) is not even a byte (afaik).

The documentation says that I should get a bytes object from the call to digest(), but for some reason this seems to not be the case(?..). My understanding is that a bytes object is just a list of bytes (and the function should thus produce an output like b'\x0f\xff\x75...' or whatever and never produce an output containing \xe7xe or start with a letter). What am I misunderstanding here?


Solution

  • b'\xe7xe\xdf' is a bytes with exactly four bytes:

    >>> [hex(b) for b in b'\xe7xe\xdf']
    ['0xe7', '0x78', '0x65', '0xdf']
    

    It just-so-happens that two of those bytes fall in the ASCII printable range, so they're represented as characters instead of \x## sequences.

    • 0x78 is x
    • 0x65 is e

    For confirmation, you can compare b'\xe7\x78\x65\xdf' with b'\xe7xe\xdf'. They're two different representations of the exact same bytes:

    >>> b'\xe7\x78\x65\xdf' == b'\xe7xe\xdf'
    True
    

    For a more consistent human-readable representation, you can convert the bytes object to a hex string using it's hex method:

    >>> b'\xe7xe\xdf'.hex(' ')
    'e7 78 65 df'
    

    Or you can retrieve a hexstring from the get-go by using hash.hexdigest instead of hash.digest.