Search code examples
pythonstringencryptionaespycryptodome

Using AES with CBC mode to encrypt UTF-16 into UTF-16


I am using pycryptodome for encryption in my application. Part of the application requires me to open a file, encrypt the data in the file, and encrypt the file's name. I used the AES with CBC mode (since it's my preferred method when dealing with files), and the file encryption class works great:

class Cipher:
    def __init__(self, password: str, key: bytes = None):
        self.key = PBKDF2(password, key if key else password_as_key(password), dkLen=32)
        self.cipher = AES.new(self.key, AES.MODE_CBC, iv=get_random_bytes(16))

    def encrypt(self, data: bytes) -> bytes:
        return self.cipher.encrypt(pad(data, AES.block_size)) + self.cipher.iv

    # decryption is the same as encryption, but it's necessary to remove the IV before
    def decrypt(self, data: bytes) -> bytes:
        encrypted_data = data[:-16]
        iv = data[-16:]
        self.cipher = AES.new(self.key, AES.MODE_CBC, iv=iv)
        return unpad(self.cipher.decrypt(encrypted_data), AES.block_size)

I wanted to use this encryption for the name encryption as well, but there's an issue. When encrypting the file's name, I need the file to be saved on the file system, so i can't just save the name as a bunch of bytes. however, simply decoding the text isn't possible, since AES does a lot of byte shifting, so I can't make sure the file stays in any encodings, whether it'd be UTF-8, UTF-16 and even base64 (i tried them all, they all failed). so I'm wondering, is there any way to encrypt a string into a string in this case?

Btw, I know I said in the title that I want to use AES with CBC mode to do the encryption, and this is still true. but if there's no other way to solve this problem, then I am fine with using other encryption methods, as long as they are as secure and fast as AES encryption.


Solution

  • There are no less than three options I can see:

    1. FPS or format preserving encryption: this is probably the option that most fits your requirements. It will encrypt a text using any alphabet to something with the same alphabet and size. The problem is that it is a pretty hard assignment and implementations of FPS such as FF1 and FF3 are not that common. FPS security is a tricky subject and it won't be as fast - but probably fast enough.

    2. CBC + base conversion: it is possible to encrypt using any scheme such as CBC and then to perform base conversion to indices within an alphabet that contains just those characters that are valid for a filename - possibly except the dot. This will expand the filename though as CBC has an overhead, and because the ciphertext is using all possible values for a byte.

      a. Base conversion can be tricky when used for larger ciphertext if you want to use all possible characters (for a specific OS); it takes a lot of CPU if not implemented right.

      b. To make it easier you can use base64url to encode the encrypted file name as suggested in the comments, e.g. to make it <base64offullfilename>.b64 but beware of size limitations for the filename & path.

    3. Store it as meta information: probably the easiest option is to store the filename as part of the ciphertext and then use a serial number or similar (e.g. encrypted_file_001.bin). You'd just have to distinguish the filename from the content if you do this. If you don't want to do this yourself you could put the file(s) in an archive and encrypt that.

    At this point you'll probably have to choose first before any implementation can be made. As the first one does require a lot of understanding, number two is quite a lot of work I'd suggest taking option 2b or 3.

    Remarks:

    • not all UTF-16 strings are valid filenames; things like colons and slashes are usually not put in filenames;
    • base64url is explicitly made to be safe to use for filenames as well.