Search code examples
c99strict-aliasing

Is casting uint8_t* to uint32_t*, or uint64_t* well-defined in C99 as long as we are sure we do not create aliases?


Consider a C99 program that reads from a read-only binary blob linked into the program's binary through a linkerfile. The program knows where the blob starts in memory, but its layout is not known during compilation. The blob consists of unsigned 32-bit, and 64-bit integers. We took care to make sure that their endianness corresponds to (data) endianness on the used platform. We also took care to put the blob in memory such that it is 4B aligned.

Requirements:

  1. (performance) We want to read both 32-bit and 64-bit integers with minimum number of instructions, based on the possibilities of individual platforms (e.g. to use single load instruction where applicable)

    • we do not want to read the value byte-by-byte and then use shifting and adding to reconstruct the 4B/8B integer.
  2. (portability) This program must run on ARM, x86_64 and MIPS architectures. Also some architectures have 32-bit system bus, others have 64-bit bus.

    • we do not want to have to maintain arch-specific adaptations for each architecture with inlined assembly code.
    • we do not want to make assumptions about used toolchain, e.g. we don't want to use -fno-strict-aliasing and similar.

Seemingly, this could be done with type-punning. We know where in the memory is the value we want to read and we can cast the pointer from original (unsigned char*) to one of uint32_t*, uint64_t*.

But C99's strict aliasing rules confuse me.

There will be no aliasing, of that we can be sure - we would not be punning on the same memory location to two different types that are not unsigned char. The layout of the binary blob does not allow this.

Question:

Is casting a const uint8_t* to const uint32_t*, or const uint64_t* well-defined in C99, as long as we are sure we do not alias the same pointers to both const uint32_t* and const uint64_t*?


Solution

  • The strict aliasing rules are effectively (pun intended (the 2nd pun intended too)) 6.5p6 and 6.5p7.

    If you read through a declared char buffer, e.g.:

    char buf[4096];
    //...
    read(fd, buf, sizeof(buf);
    //...
    

    and want do *(uint32_t*)(buf+position) then you're definitely violating

    6.5p7

    An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

    • a type compatible with the effective type of the object,

    If you mmap or malloc the buffer (make the memory dynamically typed), then it's more complicated, but in any case, the standard-compliant way way of reading such a uint32_t--through memcpy--works in either case and typically carries no performance penalty because optimizing compilers recognize memcpy calls and treat them specially.

    Example:

    #include <stdint.h>
    #include <string.h>
    
    uint32_t get32_noalias(void const *P) 
    {
         return *(uint32_t*)(P);
    }
    
    
    static inline uint32_t get32_inl(void const *P) 
    { 
        uint32_t const*p32 = P; 
        //^optional (might not affect codegen)
        //to assert that P is well-aligned for uint32_t
        uint32_t x; memcpy(&x,p32,sizeof(x)); 
        return x; 
    }
    
    //should generate same code as get32_noalias
    //but without violating 6.5p7 when P points to a char[] buffer
    uint32_t get32(void const *P) 
    { 
        return get32_inl(P);
    }
    

    https://gcc.godbolt.org/z/sGf4rf

    Generated assembly on x86-64:

    get32_noalias:                          # @get32_noalias
            movl    (%rdi), %eax
            retq
    
    get32:                                  # @get32
            movl    (%rdi), %eax
            retq
    

    While*(uint32_t*)p probably won't blow up in your case in practice (if you only do readonly accesses or readonly accesses intertwined with char-based writes like those done by the read syscall, then it "practically" shouldn't blow up), I don't see a reason to avoid the fully-standard compliant memcpy-based solution.