Search code examples
javahashbouncycastlesha-3

SHA-3 variable length hash same as truncating normal hash in Java using BouncyCastle


I need to generate a fixed length hash of 30 characters based on some input data (like a customer email address) in Java. After some searching, I found out about SHA-3 sponge functions, where I can specify the required length. I implemented the following using Bouncy Castle SHAKEDigest class.

public class App {
    public static void main(String[] args) {
        final String message = "Hello World!";
        System.out.println(getHash(message, 64));
        System.out.println(getHash(message, 30));
        System.out.println(getHash(message, 20));
    }

    static String getHash(final String message, final int lengthInCharacters) {
        final byte[] messageBytes = message.getBytes(StandardCharsets.UTF_8);

        final SHAKEDigest digest = new SHAKEDigest(128);

        final byte[] hashBytes = new byte[lengthInCharacters / 2];
        digest.update(messageBytes, 0, messageBytes.length);
        digest.doOutput(hashBytes, 0, hashBytes.length);

        return Hex.toHexString(hashBytes);
    }
}

If I execute it, I get the following output:

aacfe6ebd3737d9f195c837c5281d3f87646ecd7e43864e1a40456e40f264046
aacfe6ebd3737d9f195c837c5281d3
aacfe6ebd3737d9f195c

I expected that the hashes are totally different depending on the requested length. As it looks now, I could also generate a simple SHA-256 hash using JDK MessageDigest and just truncate it on the required length.

Am I doing something wrong or am I misunderstanding the point of those sponge functions?

Full code with unit tests is available at: https://github.com/steinsag/java-dynamic-hash


Solution

  • Nit: SHAKEn are actually Extensible Output Functions (XOFs) built on the Keccak sponge in the same way that the (fixed length) SHA3 hashes are; see https://en.wikipedia.org/wiki/SHA-3#Instances .

    But the point you seem to have misunderstood is that the underlying sponge makes each/all of these deterministic -- a given instance (parameterization) produces the same output everytime for the same input, and is not affected by the output size as such. Thus SHA3-256(m) is not the first 256 bits of SHA3-512(m) because it has different parameters, while SHAKE128(m,256) is the first 256 bits of SHAKE128(m,512) but is not SHAKE256(m,256).

    Yes, you can truncate any SHA3 hash (or SHA2 hash for that matter) to a size smaller than its normal size and get a smaller but otherwise equally good crypto hash (pseudo-random, irreversible and noncolliding for real data), and people have in fact been doing exactly this for decades. But you can't safely increase it, which you can with an XOF like SHAKE.