Search code examples
cglibcstrict-aliasing

Strict aliasing rule and strlen implementation of glibc


I have been reading about the strict aliasing rule for a while, and I'm starting to get really confused. First of all, I have read these questions and some answers:

According to them (as far as I understand), accessing a char buffer using a pointer to another type violates the strict aliasing rule. However, the glibc implementation of strlen() has such code (with comments and the 64-bit implementation removed):

size_t strlen(const char *str)
{
    const char *char_ptr;
    const unsigned long int *longword_ptr;
    unsigned long int longword, magic_bits, himagic, lomagic;

    for (char_ptr = str; ((unsigned long int) char_ptr 
             & (sizeof (longword) - 1)) != 0; ++char_ptr)
       if (*char_ptr == '\0')
           return char_ptr - str;

    longword_ptr = (unsigned long int *) char_ptr;

    himagic = 0x80808080L;
    lomagic = 0x01010101L;

    for (;;)
    { 
        longword = *longword_ptr++;

        if (((longword - lomagic) & himagic) != 0)
        {
            const char *cp = (const char *) (longword_ptr - 1);

            if (cp[0] == 0)
                return cp - str;
            if (cp[1] == 0)
                return cp - str + 1;
            if (cp[2] == 0)
                return cp - str + 2;
            if (cp[3] == 0)
                return cp - str + 3;
        }
    }
}

The longword_ptr = (unsigned long int *) char_ptr; line obviously aliases an unsigned long int to char. I fail to understand what makes this possible. I see that the code takes care of alignment problems, so no issues there, but I think this is not related with the strict aliasing rule.

The accepted answer for the third linked question says:

However, there is a very common compiler extension allowing you to cast properly aligned pointers from char to other types and access them, however this is non-standard.

Only thing comes to my mind is the -fno-strict-aliasing option, is this the case? I could not find it documented anywhere what glibc implementors depend on, and the comments somehow imply that this cast is done without any concerns like it is obvious that there will be no problems. That makes me think that it is indeed obvious and I am missing something silly, but my search failed me.


Solution

  • In ISO C this code would violate the strict aliasing rule. (And also violate the rule that you cannot define a function with the same name as a standard library function). However this code is not subject to the rules of ISO C. The standard library doesn't even have to be implemented in a C-like language. The standard only specifies that the implementation implements the behaviour of the standard functions.

    In this case, we could say that the implementation is in a C-like GNU dialect, and if the code is compiled with the writer's intended compiler and settings then it would implement the standard library function successfully.