Search code examples
cpointerscompiler-optimizationstrict-aliasing

breaking strict aliasing and getting away with it


It is well-known that optimizing C (and probably C++, too) compilers would break code that does something like this:

type1 foo, *pfoo;
type2_incompatible_with_type1 *pbar;
/* ... */
pfoo = &foo;
pbar = (type2_incompatible_with_type1*)pfoo;
/* ... */
while(condition){
    change_type2_value(pbar);
    /* ... */
    use_type1_value_which_should_have_changed(*pfoo);
    /* ... */
}

where change_type2_value looks something like this:

void change_type2_value(type2_incompatible_with_type1 *pbar){
    *pbar = SOME_VALUE;
}

The compiler might consider pfoo and pbar different pointers, although they point to the same memory location, and, therefore, will not necesserily reload the contents of *pfoo every time we change the value pointed to by pbar, even though it changes the memory pointed to by pfoo too.

However, if we do something like this:

type1 foo, *pfoo;
/* ... */
pfoo = &foo;
/* ... */
while(condition){
    change_type2_value((type2_incompatible_with_type1*)pfoo);
    /* ... */
    use_type1_value_which_should_have_changed(*pfoo);
    /* ... */
}

Although the dereferencing will occur inside the change_type2_value function, this still technically breaks strict aliasing as, in reality, our pointer points to a different type. However, are there any real conditions on which strict-aliasing optimization used by a real compiler could break this code too?

I think it would be a possibility if a compiler went out of the current function's scope to look at what should be happening in another function just to figure out if it should reload the memory pointed to by a variable that has been passed to it. Which doesn't seem too feasible to me.

Or is it possible that a real compiler can do something as nasty as assume that the function in question will not change the memory pointed to by our pointer if we pass it in as cast to a pointer to an incompatible type?


Solution

  • Strictly conforming? No.

    Look at the following:

    void bar(){
        int x=7;
        foo();
        printf("%d\n",x);
    }
    

    What value of x gets printed? It's 7. There's no legal way for foo() to modify x.

    How about

    void bar(int* x){
        *x=7;
        foo();
        printf("%d\n",*x);
    }
    

    All bets are off. foo() may have access to the address pointed to by x by other means. We can't say and the compiler may very well have as little to go on as we have here. It depends where and how foo() is defined and how holistic the compiler is. For example if foo() is inline etc.

    Now what you've done is cast before passing in to change_type2_value(.) and since it's not strictly conformant to dereference that pointer the compiler is still conformant if it makes the assumption that change_type2_value(.) doesn't dereference its argument and if it is a local variable (your snippet doesn't make that clear) it absolutely can assume it hasn't changed.

    This next bit is a bad idea

    What if you replaced that call with a call to:

    void do_secret_stuff_with_type1(type1* t1){
        type2_incompatible_with_type1* pbar = (type2_incompatible_with_type1*)pfoo;
        change_type2_value((type2_incompatible_with_type1*)pfoo);
    }
    
    type1 foo, *pfoo;
    /* ... */
    pfoo = &foo;
    /* ... */
    while(condition){
        do_secret_stuff_with_type1(pfoo); //Looks innocent, right?
        /* ... */
        use_type1_value_which_should_have_changed(*pfoo);
        /* ... */
    }
    

    The same goes. It's certainly making it harder for the compiler to spot what's going on. If do_secret_stuff_with_type1() is defined in a separate translation unit you're progressively increasing the chances you'll confound the compiler.

    End of bad idea

    However such hacking is almost certainly a dreadful idea. Why are you trying to do this? Forgetting about aliasing what are you doing that casting between incompatible types will result in a useful program?

    For any real case there's almost always a solution involving accessing objects through unsigned char* or copying in using memcpy() and in some cases using a union that turns non-conformant programs into conformant programs.