Search code examples
pythonc#encryptioncryptographyaes

AES - Decryption


I have some C# code that encrypts the body of an email before it sends it to another email account, using AES. I believe the default mode for AES in C# is CBC and I also believe the default padding method in C# is PKCS#7.

The C# code applies the Default encoding to encode the ciphertext - possibly using the machine's active code page. The server and local machine's active code page is cp437. Decryption is done using C++ in the production environment and it works, I require a Python version 3.+ equivalent for handling decryption.


Solution

  • Before the mail is sent, an explicit decoding is performed in the C# code using the default encoding, which according to the question is Cp437. With Cp437, however, the encoding fails, whereas it is successful with Cp1252.

    Using Cp1252 results in mail.BodyEncoding being implicitly set from ASCIIEncoding (default) to UTF8Encoding and mail.BodyTransferEncoding to TransferEncoding.Base64.

    Cp1252 has (e.g. in contrast to Cp437) undefined codepoints, namely 0x81, 0x8d, 0x8f, 0x90 and 0x9d. While the defined codepoints of Cp1252 can be converted to UTF8 without any problems (each Cp1252 character also corresponds to a valid Unicode character, e.g. codepoint U+20AC (€): 0x80 (Cp1252), 0xE282AC (UTF-8)), it is not clear a priori how the undefined codepoints are converted. It turns out that the codepoints are simply UTF8 encoded, i.e. 0x81, 0x8d, 0x8f, 0x90 and 0x9d are converted to 0xc281, 0xc28d, 0xc28f, 0xc290 and 0xc29d, see e.g. here. After the UTF8 encoding, the Base64 encoding is performed.

    For the decryption in the Python code you simply have to proceed in the opposite direction: First Base64 decoding, then UTF8 decoding and finally Cp1252 encoding (keeping in mind the undefined codepoints). The result is the actual ciphertext.

    A possible implementation of the encoding is:

    def customEncode(ciphertextB64):
        cipherbytes = base64.b64decode(ciphertextB64)
        ciphertext = cipherbytes.decode('utf8')
        undefCodepoints = [0x81, 0x8d, 0x8f, 0x90, 0x9d]
        result = []
        for char in ciphertext:
            if ord(char) in undefCodepoints:
                data = bytes([ord(char)])           
            else:
                data = char.encode('Cp1252')  
            result.append(data)
        return b''.join(result) 
    

    With this, the posted ciphertext can be encoded and decrypted:

    ciphertextB64 = """amDDjAsQJjEzw7nFvSpcIgHFksO/xb3CoF7Cv+KAsMKN4oCZNeKAulxaHwghwo1ExaEQKcOrGMO9
                       Iw4Rw4sncsOLxb3Di8OKwqs0AgdFwo1CB8Oy4oCTPkrDlSbDkTDDtB3Cj2PDocW4AcKxM0bigJnD
                       gsOsw6scw47DlQHCuEEnwqxZwqnDp8KdDBNzw7JKw70aw5/DtcK2FHzigJNJwq1kBsKyw57CpMOi
                       CkPigJQnw5nCgVUcw5bCtl9j4oCcG8OGw5Yiw4zCv1bDrzhBMREIwr1yKMOTT8OMw68OVsOKeGxx
                       wq3Dv0Nkw4vDgcO4wqYCw7DDi8OEFsKjEcOjwrdzw5RUdU/CqwBZw6rDvcKsw67DvE5lwqvDhMKv
                       w5HDiwBy4oC6NsO+w5vigLDDjcOGMHElHA7CjULDnsKtUuKAoH0LUxclPV3FuMO6aWtVAuKAnlcF
                       wr/CsDbCqQEAwr1JMcW+w692w6XCrgbDt8ObZDnDgcOqF8KmwrrCucK9a8Kt4oCdZMOpHiPDigfD
                       hcWSNMOmw7zDhcOPAMOlSzXDs2XDlMKoBcOLdcOMw5PDjeKAusKxw7U94oCgY8Oww6XDrnLigJQj
                       csO7wq8vy5wJMcK4cuKAoMO/MsO/4oCwJcOdXMK0fWxND8uGbiRnBMW4Kw=="""
    cipherbytes = customEncode(ciphertextB64)
    key = b'\x12\x34\x56\x78\x9A\xBC\xDE\xF0\x12\x34\x56\x78\x9A\xBC\xDE\xF0\x12\x34\x56\x78\x9A\xBC\xDE\xF0\x12\x34\x56\x78\x9A\xBC\xDE\xF0'
    iv1 = b'\x02\x13\x24\x35\x46\x57\x68\x79\x8A\x9B\xAC\xBD\xCE\xDF\xE0\xF1'
    cipher = AES.new(key, AES.MODE_CBC, iv1)
    decrypted = cipher.decrypt(cipherbytes)
    decryptedUnpad = unpad(decrypted, AES.block_size)
    print(decryptedUnpad) # b'<!DOCTYPE html><html><head><title>Register new RikRhino camera</title></head><body><p>IMEI:324<br/>ServerUrl:https://cmorelm.chpc.ac.za/za<br/>Token:1m7e9LaDp42v6l8hm71l5tZe9z4vO4EFDmiZHiH06e4=<br/>destinationGroup:7<br/>Altitude:4.7<br/>Latitude:-33.7498685982923<br/>Longitude:19.3239212036133</p></body></html>'
    

    The decrypted ciphertext is:

    <!DOCTYPE html><html><head><title>Register new RikRhino camera</title></head><body><p>IMEI:324<br/>ServerUrl:https://cmorelm.chpc.ac.za/za<br/>Token:1m7e9LaDp42v6l8hm71l5tZe9z4vO4EFDmiZHiH06e4=<br/>destinationGroup:7<br/>Altitude:4.7<br/>Latitude:-33.7498685982923<br/>Longitude:19.3239212036133</p></body></html>
    

    The encoding is unnecessarily complicated. Furthermore, platform-specific dependencies cannot be excluded (e.g. the handling of the undefined codepoints), so that the implementation may be platform-dependent and therefore not reliable.

    The most reasonable fix is therefore to use a binary-to-text encoding like Base64 in the C# code instead of the charset encoding (in combination with the default values ASCIIEncoding for mail.BodyEncoding and TransferEncoding.SevenBit for mail.BodyTransferEncoding). Otherwise, this issue will probably continue to cause difficulties in the future.


    Update: Concerning your question: Why does the encoding seem to fail with Cp437, whereas it is successful with Cp1252? This is weird especially since, as you mentioned, an explicit decoding is performed in the C# code using the default encoding (which we found is Cp437).
    From the C# code (meanwhile only accessible via the history) it is only evident that the default encoding is used. That it is Cp437 cannot be deduced, this was information you provided afterwards (originally you stated that it was ISO-8859-1 or UTF-8 or UTF-16, see the history). Since the posted message can be encoded with Cp1252 but not with Cp437, it is more likely that Cp1252 was the used default and not Cp437.
    The default encoding is platform dependent (another reason not to use it for encoding a ciphertext) and sometimes not very transparent. E.g. on Windows systems there are two code pages (ANSI and OEM), which can be quite different (e.g. ANSI: often Cp1252 in Western Europe, OEM: often Cp850 in Western Europe, Cp437 in USA). According to the documentation, the Encoding.Default property returns the ANSI code page. Possibly there is simply a mix-up here.
    I'm not claiming that this is the correct explanation, but it's a possible one. I also don't want to exclude completely the possibility that an encoding of the posted ciphertext with a charset other than Cp1252 is feasible. However, there are convincing reasons (besides the successful encoding and decryption) for Cp1252:

    • The binary data after Base64 decoding corresponds to the allowed and very characteristic UTF-8 sequences without any exception, so that practically with certainty a UTF-8 encoding can be assumed.
    • The characters after UTF-8 decoding in turn correspond to the characters of the Cp1252 charset but not to those of the Cp437 charset (or another one), so that only an encoding with Cp1252 is achievable, but probably with no other charset.

    Since the assumption that Cp1252 was used consists of an analysis of the characters decoded with UTF-8, the probability for its correctness depends on the length/number of the analyzed ciphertexts. The posted ciphertext already has a statistically relevant length, but a verification with further ciphertexts to check that assumption is nonetheless advisable.