So I wanted to get to know how hashlib.md5 works and produced the following code:
import hashlib
a = b'yolo'
h = hashlib.md5(a).digest()
b = h[6:10]
print(h)
print(b)
Don't mind the fact that I used "yolo" as a string. This is just for testing.
Now when running this code, it produces
b'O\xde\xd1FG6\xe7xe\xdf#,\xbc\xb4\xcd\x19'
b'\xe7xe\xdf'
which quite frankly seems to be off. First of all, I expected 4 bytes (bytes 6-9 both included) to come out in the second line and the first part (the\xe7xe
) is not even a byte (afaik).
The documentation says that I should get a bytes object from the call to digest()
, but for some reason this seems to not be the case(?..). My understanding is that a bytes object is just a list of bytes (and the function should thus produce an output like b'\x0f\xff\x75...'
or whatever and never produce an output containing \xe7xe
or start with a letter). What am I misunderstanding here?
b'\xe7xe\xdf'
is a bytes
with exactly four bytes:
>>> [hex(b) for b in b'\xe7xe\xdf']
['0xe7', '0x78', '0x65', '0xdf']
It just-so-happens that two of those bytes fall in the ASCII printable range, so they're represented as characters instead of \x##
sequences.
0x78
is x
0x65
is e
For confirmation, you can compare b'\xe7\x78\x65\xdf'
with b'\xe7xe\xdf'
. They're two different representations of the exact same bytes:
>>> b'\xe7\x78\x65\xdf' == b'\xe7xe\xdf'
True
For a more consistent human-readable representation, you can convert the bytes
object to a hex string using it's hex
method:
>>> b'\xe7xe\xdf'.hex(' ')
'e7 78 65 df'
Or you can retrieve a hexstring from the get-go by using hash.hexdigest
instead of hash.digest
.