Search code examples
securityauthenticationencryptionhashaes-gcm

Why use Authenticated Encryption instead if hashes?


What is the benefit of using Authenticated Encryption schemes like GCM or EAX compared to simpler methods like CRC or hash functions like SHA?

As far as I understand these methods basically add a Message Authentication Code (MAC) to the message so it can be validated. But the same would be possible if a CRC or hash value would be calculated and appended to the plaintext (MIC). This way it would also not be possible to tamper with the message because the hash would probably not match anymore.

The linked Wikipedia article says MICs don't take the key etc. into account but I do not understand why this is a problem.


Solution

  • There's conceptually no difference between an Authenticated Encryption scheme (GCM, CCM, EAX, etc) and providing an HMAC over the encrypted message, the AE algorithms simply constrain and standardize the byte pattern (while tending to require less space/time than a serial operation of encrypt and HMAC).

    If you are computing your unkeyed digest over the plaintext before encrypting you do have a tamper-evident algorithm. But computing the digest over the plaintext has two disadvantages over computing it over the ciphertext:

    1. If you send the same thing twice you send the same hash, even if your ciphertext is different (due to a different IV or key)
    2. If the ciphertext has been tampered with in an attempt to confuse the decryption routine you will still process it before discovering the tamper.

    Of course, the disadvantage of digesting after is that in your unkeyed approach anyone who tampers with the ciphertext can simply recompute the SHA-2-256 digest of the ciphertext after the tamper. The solution to that is to not do an unkeyed digest, but to do a keyed digest, like HMAC.

    The options are:

    • Encrypt-only: Subject to tampering. Assuming a new IV is used for each message (and ECB isn't used) does not reveal when a message repeats.
    • Digest-only: Subject to tampering. Message is plain-text.
    • MAC-only: Not subject to tampering. Message is plain-text.
    • Digest-then-Encrypt (DtE - digest is itself encrypted): Ciphertext corruption attacks are possible. Tampering with the plaintext is possible, if it is known. Message reuse is not revealed.
    • Digest-and-Encrypt (D&E/E&D - digest plaintext, send digest as plaintext): Ciphertext corruption attacks are possible. Tampering with the plaintext is possible, if it is known. Message reuse is revealed via the digest not changing.
    • Encrypt-then-Digest (EtD): This guards against transmission errors, but since any attacker can just recompute the digest this is the same as encrypt-only.
    • MAC-then-Encrypt (MtE): Same strengths as DtE, but even if the attacker knew the original plaintext and what they had tampered it to they cannot alter the MAC (unless the plaintext is being altered to an already-known message+MAC).
    • MAC-and-Encrypt (M&E/E&M): Like D&E this reveals message reuse. Like MtE it is still vulnerable to ciphertext corruption, and a very small set of tampering.
    • Encrypt-then-MAC (EtM): Any attempt to alter the ciphertext is discovered by the MAC failing to validate, this can be done before processing the ciphertext. Message reuse is not revealed, since the MAC was over the ciphertext.

    EtM is the safest approach in the general case. One of the things that an AE algorithm solves is that it takes the question of how to combine a MAC and cipher out of the developer's hands and puts it into the hands of a cryptographer.