Search code examples
python-3.xamazon-web-servicesencryptionaws-glue

Decryption of text from Java jasypt library with Python3


In database, I have emails of the users which are encrypted in the backend using Java Jasypt library with default configuration. From what I understand, it uses PBEWITHMD5andDES and 1000 iterations to generate the key.

@Bean
StringEncryptor encryptor() {
    StandardPBEStringEncryptor spbe = new StandardPBEStringEncryptor();
    spbe.setPassword(symmetricKey);
    return spbe;
}

Then encryptor is used as follows:

@Before("insertAccount(account)")
public void beforeInsertAccount(Account account) {
    account.setFirstName(encryptor.encrypt(StringUtils.defaultString(account.getFirstName())));
    account.setPhone(encryptor.encrypt(StringUtils.defaultString(account.getPhone())));
    account.setEmail(encryptor.encrypt(StringUtils.defaultString(account.getEmail())));
}

In AWS Glue (ETL tool) I need to decrypt the emails of the users, or at least match encrypted emails from two different tables. I have a possibility to define custom transformation by creating Python3 script.

I wrote a script based on that: https://gist.github.com/jpralves/505e653fd1c7358ad2c540e25e1ee80a It's using pycryptodome library. data_to_decrypt variable is initialized with jasypt-encrypted 'foo@bar.com' using password 'test'.

def MyTransform (glueContext, dfc) -> DynamicFrameCollection:
from Crypto.Hash import MD5
from Crypto.Cipher import DES
import base64
import sys
from pyspark.sql.functions import lit
    
newdf = dfc.select(list(dfc.keys())[0]).toDF()

data_to_decrypt = base64.b64decode("epncHsHYRZd8uIWncULit//8f0mhk8pn")
password = "test"

bs = 8
_iterations = 1000
salt = data_to_decrypt[:bs]
data = data_to_decrypt[bs:]

hasher = MD5.new()
result = hasher.digest()
hasher.update(bytearray(password.encode()))
hasher.update(bytearray(salt))

for i in range(1, _iterations):
    hasher = MD5.new()
    hasher.update(result)
    result = hasher.digest()

encoder = DES.new(result[:bs], DES.MODE_CBC, result[bs:bs*2])
decrypted = encoder.decrypt(bytes(data))

length = len(decrypted)
unpadding = int(decrypted[length-1])

decryptedEmail = ''
if length - unpadding > 0:
    decryptedEmail = decrypted[:(length - unpadding)].decode("latin")
else:
    decryptedEmail = decrypted.decode("latin")

newdf = newdf.withColumn('decryptedEmail', lit(decryptedEmail))

dyf_filtered = DynamicFrame.fromDF(newdf, glueContext, "aaa")
return(DynamicFrameCollection({"CustomTransform0": dyf_filtered}, glueContext))

Script is outputting some nonsense like "’€ ‡>— ¦õ8Þûð›7e". When I tried to decode output string from encoder in any other encoding it failed.


Solution

  • Your actual problem is that your first hash is wrong; you need to take .digest after doing the two .updates. (Your iterated hashes are correct.) In addition your unpadding is poor: PKCS5 padding should not exceed one block which for DES is 8 bytes. Even better would be to check all the padding bytes if more than 1, but I didn't bother.

    $ cat 71576901.py3
    from Crypto.Hash import MD5
    from Crypto.Cipher import DES
    import base64
    
    data_to_decrypt = base64.b64decode("epncHsHYRZd8uIWncULit//8f0mhk8pn")
    password = "test"
    
    bs = 8
    _iterations = 1000
    salt = data_to_decrypt[:bs]
    data = data_to_decrypt[bs:]
    
    hasher = MD5.new()
    hasher.update(bytearray(password.encode()))
    hasher.update(bytearray(salt))
    result = hasher.digest() # moved down
    
    for i in range(1, _iterations):
        hasher = MD5.new()
        hasher.update(result)
        result = hasher.digest()
    
    encoder = DES.new(result[:bs], DES.MODE_CBC, result[bs:bs*2])
    decrypted = encoder.decrypt(bytes(data))
    
    length = len(decrypted)
    unpadding = int(decrypted[length-1])
    
    if unpadding > 0 and unpadding <= bs: # better check
      print (decrypted[:-unpadding].decode('latin1')) # or other decoding depending on what you encrypted
    else:
      print ('bad') # might better raise, but TBD
    $ python3 71576901.py3
    foo@bar.com
    

    And just be sure you know, this is a very weak encryption and easily broken -- but that's a security issue, not programming, and offtopic here.