Search code examples
cpointersgcccastingstrict-aliasing

Need help to resolve warning: dereferencing type-punned pointer will break strict-aliasing rules


I am working on a set of C code to optimize it. I came across a warning while fixing a broken code.

The environment is Linux, C99, compiling with -Wall -O2 flags.

Initially a struct text is defined like this:

    struct text {
        char count[2];
        char head[5];
        char textdata[5];
    }

The code is to return pointer T1 and T2 to expected head and textdata strings:

int main(void) {
    struct text *T1;
    char *T2;
    char data[] = "02abcdeabcde";

    T1 = (struct text *)data;
    T2 = T1->textdata;
    gettextptr((char *)T1, T2);
    printf("\nT1 = %s\nT2 = %s\n", (char *)T1, T2);
    return (0);
}

void gettextptr(char *T1, char *T2) {
    struct text *p;
    int count;

    p = (struct text *)T1;
    count = (p->count[0] - '0') * 10 + (p->count[1] - '0');

    while (count--) {
        if (memcmp(T2, T1, 2) == 0) {
            T1 += 2;
            T2 += 2;
        }
    }
}

This wasn't working as expected. It was expected to return the addresses of first 'c' and last 'e'. Through GDB, I found that, once execution pointer returns from gettextptr() to parent function, it doesn't keep the address of T1 and T2. Then I tried another approach to 'Call by reference' by using double pointer:

int main(void) {
    struct text *T1;
    char *T2;
    char data[] = "02abcdeabcde";

    T1 = (struct text *)data;
    T2 = T1->textdata;
    gettextptr((char **)&T1, &T2);
    printf("\nT1 = %s\nT2 = %s\n", (char *)T1, T2);
    return (0);
}

void gettextptr(char **T1, char **T2) {
    struct text *p;
    int count;

    p = (struct text *)(*T1);
    count = (p->count[0] - '0') * 10 + (p->count[1] - '0');

    while (count--) {
        if (memcmp(*T2, *T1, 2) == 0) {
            *T1 += 2;
            *T2 += 2;
        }
    }
}

When I compile this code with -Wall -O2, I am getting the following GCC warning:

 pointer.c: In function ‘main’:
 pointer.c:23: warning: dereferencing type-punned pointer will break strict-aliasing rules

So:

  1. Was the code calling by value in first case?

  2. Isn't (char **) permitted for casting while keeping strict aliasing rules?

  3. What am I missing to resolve this warning?


Solution

  • The strict aliasing rule is paragraph 6.5/7 of the Standard. It says basically that you may access an object only through an lvalue of compatible type, possibly with additional qualifiers; the corresponding signed / unsigned type; an array, structure, or union type with one of those among its members, or a character type. The diagnostic you received is saying that your code violates that rule, and it does, multiple times.

    You get yourself in trouble very early with:

        T1 = (struct text *)data;
    

    That conversion is allowed, though the resulting pointer is not guaranteed to be correctly aligned, but there's not much you can then do with T1 without violating the strict aliasing rule. In particular, if you dereference it with * or -> -- which is in fact the very next thing you do -- then you access a char array as if it were a struct text. That is not allowed, though the reverse would be a different story.

    Converting T1 to a char * and accessing the pointed to array through that pointer, as you do later, are some of the few things you may do with it.

    gettextexpr() is the same (both versions). It performs the same kind of conversion described above, and dereferences the converted pointer when it accesses p->count. The resulting behavior violates the strict aliasing rule, and is therefore undefined. What GCC is actually complaining about in the second case, however, is probably accessing *T1 as if it were a char *, when it is really a struct text * -- another, separate, strict aliasing violation.

    So, in response to your specific questions:

    1. Was the code calling by value in first case?

    C has only pass by value, so yes. In the first case, you pass two char pointers by value, which you could then use to modify the caller's char data. In the second case, you pass two char * pointers by value, which you can and do use to modify the caller's char * variables.

    1. Isn't (char **) permitted for casting while keeping strict aliasing rules?

    No, absolutely not. Casting to char * (not char **) can allow you to access an object's representation through the resulting pointer, because dereferencing a char * produces an lvalue of character type, but there is no type that can generically be converted from without strict-aliasing implications.

    1. What am I missing to resolve this warning?

    You are missing that what you are trying to do is fundamentally disallowed. C does not permit access a char array as if it were a struct text, period. Compilers may nevertheless accept code that does so, but its behavior is undefined.

    Resolve the warning by abandoning the cast-to-structure approach, which is providing only a dusting of syntactic sugar, anyway. It's actually simpler and clearer to get rid of all the casting and write:

        count = ((*T1)[0] - '0') * 10 + ((*T1)[1] - '0');
    

    It's perhaps clearer still to get rid of all the casting use sscanf:

        sscanf(*T1, "%2d", &count);
    

    Note also that even if it were allowed, your specific access pattern seems to make assumptions about the layout of the structure members that are not justified by the language. Implementations may use arbitrary padding between members and after the last member, and your code cannot accommodate that.