Search code examples
encryptioncryptography

Can a cipher be shorter than the original text?


I am reading the book "Serious Cryptography" by Jean-Philippe Aumasson and early on in the book there is a note which says:

"For some ciphers, the ciphertext is the same size as the plaintext; for some others, the ciphertext is slightly longer. However, ciphertexts can never be shorter than plaintexts."

This seems incorrect to me because I don't understand why a cipher can't also compress text that it is encrypting. For example, I can create a cipher where every letter maps to 1 letter up in the alphabet, except the word "THE" maps to 1 and the word "IS" maps to 2.

Encrpying the text "The cat is old" will yield "1 dbu 2 pme" with this cipher, which is shorter than the original text. Is there something I am missing?

Edit: I would also like to restrict the plain text input I accept to only include letters.


Solution

  • Generally encryption acts on input that cannot be compressed. It should be possible to input any kind of message. As most modern ciphers are bit/byte oriented that means that all possible messages of a certain bit/byte size are possible.

    Note that a modern cipher should be IND-CPA secure. This means that any message of the same size should return ciphertext that is indistinguishable. If you have a specific set of messages that increases the ciphertext then obviously they are distinguishable from the other messages. And because of the pigeon hole principle, it is of course impossible to have smaller ciphertext for all possible messages.

    It is possible to use a compression algorithm to compress specific types of messages in case your set of messages is smaller. However, you should keep in mind that the size of the resulting ciphertext may leak information to an adversary. Attacks on compressed audio streams or HTTPS connections are certainly not just a theoretic threat, they are all too practical.

    And beware that even without compression, the ciphertext size will always leak information about the plaintext. Because of above, the plaintext must contain less information than can be encoded by the specific ciphertext size after all.