Search code examples
pythondecodedigital-signaturepycryptodome

Why do I have to hexlify before decoding into ascii?


So I've been trying to get better acquainted with crypto using python (specifically pycryptodome) and I've come across an interesting issue trying to decode a byte string into ascii. Please see code below:

from Crypto.Signature import PKCS1_v1_5
from Crypto.Hash import SHA
from Crypto.PublicKey import RSA
message = b'Something secret'

random_gen = Crypto.Random.new().read
print("Type of random_gen: {}".format(type(random_gen)))
private_key = RSA.generate(1024, random_gen) # private key
public_key = private_key.publickey() # public key

signer = PKCS1_v1_5.new(private_key) # signer which uses private key
verifier = PKCS1_v1_5.new(public_key) # verifier which uses public key

h = SHA.new(message) # hash of message
print("Hash: {}".format(h.hexdigest()))

signature = signer.sign(h) # sign hashed version of message
print("Signature type = {}".format(type(signature)))
print("Signature: {}".format(binascii.hexlify(signature).decode('ascii')))

In the very last line of the code why is it that I have to first hexlify() the signature which is of type <class 'bytes'> before decoding it into ascii so that I can read the signature? Why is it that if I do:

print("Signature: {}".format(signature.decode('ascii')))

I get the following error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0x88 in position 2: ordinal not in range(128)

Thanks for the help.


Solution

  • signature is a sequence of bytes : each element is an integer between 0 and 255 included, if you attempt to decode it directly in ascii, values above 127 will throw an exception.

    binascii.hexlify return a new sequence of bytes from its input : for each byte from the input, two bytes are returned in the ouput, which are codes of ascii characters that correspond to the hexadecimal representation of the input byte. So each byte of the output represent an ascii character either between '0' and '9' or between 'a' and 'f'. For example the input byte 128 produce the two characters "80" so the two bytes 56 and 48 (which are the ascii codes of the characters '8' and '0').

    So binascii.hexlify produce the hexadecimal representation in ascii form of a binary input. decode('ascii') applied after binascii.hexlify does not change the content but produce an object of str type.

    In python 3.5 and above you can simply use the hex method of a bytes object to obtain an str object containing its hexadecimal representation :

    signature.hex()