Search code examples
pythonaes

PyAES output includes b and ''


I am making an AES encryption/decryption program using PyAES, and when I print an output, it looks like this:

b'\xb6\xd52#\xb1\xd5a~.L\xc2M\x83U\xb3\xf6' (encrypted)

b'TextMustBe16Byte' (plaintext)

I want to eliminate the b and the apostrophes so that it looks cleaner on the front end.

My code:

import pyaes
import os

# A 256 bit (32 byte) key
key = os.urandom(32)

# For some modes of operation we need a random initialization vector
# of 16 bytes
iv = os.urandom(16)

aes = pyaes.AESModeOfOperationCBC(key, iv = iv)
plaintext = "TextMustBe16Byte"
ciphertext = aes.encrypt(plaintext)

# '\xd6:\x18\xe6\xb1\xb3\xc3\xdc\x87\xdf\xa7|\x08{k\xb6'
print(ciphertext)


# The cipher-block chaining mode of operation maintains state, so
# decryption requires a new instance be created
aes = pyaes.AESModeOfOperationCBC(key, iv = iv)
decrypted = aes.decrypt(ciphertext)
print(decrypted)

Solution

  • bytes objects use their repr (with b and quotes) when stringified normally. If you want to convert to an equivalent string, the simplest approach is to decode them as latin-1 (latin-1 is a 1-1 encoding that converts each byte to the Unicode ordinal of the same value).

    So just change:

    print(ciphertext)
    

    to:

    print(ciphertext.decode('latin-1'))
    

    and:

    print(decrypted)
    

    to:

    print(decrypted.decode('latin-1'))
    

    It looks like aes.encrypt is implicitly "encoding" the input string in latin-1 (it's doing [ord(c) for c in text] which is effectively encoding to latin-1 without actually checking that the characters are legal latin-1; characters with ordinals above 255 will likely explode later in processing), so this is a reasonable solution, given the limitations of the module. If you want to support non-latin-1 inputs, make sure to encode the input to encrypt with better encoding (e.g. utf-8), and decode with the same encoding at the other end (you'll want to use latin-1 for the ciphertext no matter what though; it's raw random bytes, so any other encoding makes no sense).