Search code examples
c++strict-aliasing

Avoiding strict aliasing violation in hash function


How I can avoid strict aliasing rule violation, trying to modify char* result of sha256 function.

Compute hash value:

std::string sha = sha256("some text");
const char* sha_result = sha.c_str();
unsigned long* mod_args = reinterpret_cast<unsigned long*>(sha_result);

than getting 2 pieces of 64 bit:

unsigned long a = mod_args[1] ^ mod_args[3] ^ mod_args[5] ^ mod_args[7];
unsigned long b = mod_args[0] ^ mod_args[2] ^ mod_args[4] ^ mod_args[6]; 

than getting result by concat that two pieces:

unsigned long long result = (((unsigned long long)a) << 32) | b;

Solution

  • As depressing as it might sound, the only true portable, standard-conforming and efficient way of doing so is through memcpy(). Using reinterpret_cast is a violation of strict aliasing rule, and using union (as often suggested) triggers undefined behaviour when you read from the member you didn't write to.

    However, since most compilers will optimize away memcpy() calls, this is not as depressing as it sounds.

    For example, following code with two memcpy()s:

    char* foo() {
      char* sha = sha256("some text");
      unsigned int mod_args[8];
      memcpy(mod_args, sha, sizeof(mod_args));
      mod_args[5] = 0;
      memcpy(sha, mod_args, sizeof(mod_args));
      return sha;
    }
    

    Produce following optimized assembly:

    foo():                                # @foo()
            pushq   %rax
            movl    $.L.str, %edi
            callq   sha256(char const*)
            movl    $0, 20(%rax)
            popq    %rdx
            retq
    

    It is easy to see, no memcpy() is there - the value is modified 'in place'.