Search code examples
rustendiannesssha256

Reproducable cryptographic hashes of Rust structs/enums


Currently I'm using the following code to get sha256 hashes of Rust structs and enums.

pub fn sha256<T: Sized + Serialize>(ser: T) -> [u8; 32] {
    let str = ron::ser::to_string(&ser).expect("serialization has failed");

    let mut hasher = Sha256::new();
    hasher.update(str);
    let hash = hasher.finalize();
    *hash.as_ref()
}

This works, but is far from ideal:

  • If RON serialisation changes, the hashes will change.
  • Serialisation is wasting CPU cycles.

There is a .hash() method on many types, but that seems to be for 64-bit non-crypto hashing (HashMap, etc.).

How can I cryptographically-hash arbitrary Rust structs and enums, such that the hashes will be identical regardless of architecture/word-size/endianess? (I do not use usize in these.)


Solution

  • If you want to hash an object with a cryptographic hash, you must necessarily turn it into a stream of bytes, since that's the only thing that cryptographic hashes accept. We would generally call this serialization.

    There are some things you can do:

    • Find the fastest general-purpose serialization format you can. JSON is not good for this, since it doesn't serialize byte sequences efficiently, so you could try CBOR, Msgpack, or some other binary format.
    • Add tests to your code that common structures hash to expected values so that you can verify that they work as expected and avoid breaking things unexpectedly.
    • Add a version field to your hash, if possible, so you can bump the version if you need to change the serializer or the byte structure of the serialization.
    • Use a faster hash than SHA-256, such as SHA-512, SHA-512/256, or BLAKE2b (on 64-bit systems) or BLAKE2s (on 32-bit systems) to reduce the cost of the overall operation.

    Alternatively, you could try building a custom Hasher implementation which can also output a SHA-256 value. Your structure would then need to implement Hash instead of Serialize and you'd do the hashing incrementally. This might or might not be faster.