Search code examples
pythonjavaaesencryption-symmetric

AES Encryption in Java porting to Python


DISCLAIMER: This is INSECURE encryption that I've inherited and everyone involved knows that. What I'm trying to do here is a first step in getting off this legacy system that has even bigger problems that this.

I have an existing system in Java that I am attempting to port to Python that performs an AES encryption as such:

public static String encrypt(String text, SecretKey secretKey) throws Exception {
    byte[] cipherText = null;
    String encryptedString = null;

    // get an RSA cipher object and print the provider
    Cipher cipher = Cipher.getInstance(SYMMETRIC_KEY_ALGORITHM); // AES

    // encrypt the plain text using the public key
    cipher.init(Cipher.ENCRYPT_MODE, secretKey);
    cipherText = cipher.doFinal(text.getBytes());
    encryptedString = Base64.getEncoder().encodeToString(cipherText);

    return encryptedString;
}

The problem I'm having is trying to get the same combination of AES settings to get the same result using Python and the Cryptography library from:

Python Cryptography

I see lots of examples from the seemingly defunct PyCrypto library which I cannot even get to install on my Windows system much less work.

My latest attempt is like this and I do get encryption but it doesn't match the Java AES output:

from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes

key = b'Some random key '
cipher = Cipher(algorithms.AES(key), modes.GCM(b'000000000000'))
encryptor = cipher.encryptor()
ct = encryptor.update(ENC_MESSAGE)

base64EncodedStr = base64.b64encode(ct).decode('utf-8')
print('BASE64:', base64EncodedStr)

Update based on rzwitserloot's answer.

I changed the mode from GCM to ECB and the result I'm now getting in Python is almost the same. First of all here is what the updated code looks like:

key = b'Some random key '

cipher = Cipher(algorithms.AES(key), modes.ECB())
encryptor = cipher.encryptor()
ct = encryptor.update(PLAINTEXT.encode('utf-8'))

base64EncodedStr = base64.b64encode(ct).decode('utf-8')
print('Encrypted:', base64EncodedStr)

The reference (i.e. Java) output is 1004 characters long while the Python output is only 984 characters long. But up to 3 characters from the end of the Python string they match:

Output Strings

I did check with decryption and found that both encrypted text string decrypt to the same plaintext.

FINAL UPDATE:

The padding was the problem. I updated the code to use PKCS7 padding like this and I am now getting the same result from Java and Python:

from cryptography.hazmat.primitives import padding as symmetric_padding

padder = symmetric_padding.PKCS7(algorithms.AES.block_size).padder()
padded_data = padder.update(PLAINTEXT.encode('utf-8')) + padder.finalize()

ct = encryptor.update(padded_data)

Solution

  • You need everything to be the exact same between both versions. This is not currently true; fix that.

    Your java code uses:

    • AES at block size 128, because that's just how AES rolls. Nothing to configure.
    • AES key size of 128, 192, or 256 bits.
    • The key.
    • The Mode of Operation. Your java code uses ECB (which is insecure). You've told your python code to use GCM. That's, obviously, not going to work. You need to specify ECB there, too.
    • Given that it's ECB, there is no IV.
    • The padding mode. Java code is doing PKCS5Padding here.
    • Crypto is fundamentally byte based but you're trying to encrypt strings, which aren't. That means the string is being converted to bytes, and that means a charset encoding is used. You didn't specify in your java code which means you get 'platform default'. Horrible idea, but if you can't change the java side, figure out what that is, and use the same encoding in your python code.
    • Finally, your java code base64's the result.

    For most of these, I simply can't tell you; the code you pasted is insufficient. For example, AES key size and whether the keys are actually identical? I have no idea - you tell me. How did you make that SecretKey key object?

    I'm not as familiar with python but it sure looks like you base64-encode it, and then decode it again. That's.. no, don't do that. Your java code encodes and that's that. Your python code should base64 encode and that's that.

    I'm pretty sure python also defaults to PKCS5Padding.

    That leaves the encoding mode which you 100% mismatched between java and python give what little code you pasted, and the way you construct the keys. If the text you're encrypting isn't straight ASCII, it's somewhat likely charset encoding is also causing a difference.

    It's crypto. One tiny difference and the outputs will be quite different.