Search code examples
cgcccompilationcompiler-constructionc99

When inlining a C function, can an optimizing compiler dereference a pointer more times than explicitly written in the source code?


Consider the following code:

int y = 1;
int* p = &y;

inline int f(int x) {
  return x + x;
}

int g(void) {
  return f(*p);
}

In this code, there is one explicit dereference.

Is a C compiler allowed to compile/optimize this code as, roughly:

int g(void) {
  return *p + *p;
}

That is, while inlining f, can the compiler perform and use two separate accesses to the target memory, as opposed to the single access that was actually written? Are there specific sections in the C specification that allow or disallow this behavior? Note that in the example, p is not volatile, and the "optimized" code is at least functionally correct.

(Please note, the question is not "will a compiler do this" or "should a compiler do this", but rather, according to the C specifications, can a compiler do this. Suppose register spillage makes a second dereference more efficient, or some such thing).

Referencing the C99 specification, I see some possible hints on required handling of parameters. I wonder about whether "each parameter is assigned the value of the corresponding argument" (6.5.2.2) or "its [the parameter's] identifier is an lvalue" (6.9.1) might prevent this scenario, but that interpretation seems incompatible with some of the common parameter optimizations I have seen.


Solution

  • TL;DR: Yes, it's allowed.

    It's usually hard to directly answer a question like "is this transformation allowed" by reading the C standard. In principle, it requires you to look at the rules of the language, and prove that every possible piece of code whose observable behavior is well-defined by the standard would have the same the same observable behavior after the transformation. Or, if its behavior is unspecified (i.e. there is some set of allowable behaviors), then after the transformation, its behavior would still be within that set.

    However, you can reason a little more informally. If the transformation described here (sometimes called inventing loads) were not valid, there would have to be some code whose well-defined observable behavior would change. You can get a sense of the legality by coming up with a piece of code whose actual behavior would change, and then ask whether that behavior is in fact defined by the standard.

    If two loads were performed, then the only way for the result to be different is if those two loads did not return the same value. (We can presume that at least one of them would return the same value as the single load in the un-transformed x + x version.) How can that happen on a real-life machine? If the load is being performed on ordinary memory (as opposed to a memory-mapped hardware register or something, in which case volatile would be required), then the only way for this to occur is if some other entity in the system performed a store to that address. Within the C standard, the only defined mechanism for two different entities to access the same memory simultaneously is threads.

    So an example of code whose observable behavior could be affected by this transformation would be one where some other thread is simultaneously performing a store to *p. However, this would constitute a data race, which C explicitly specifies as causing undefined behavior (see 5.1.2.4 p35), and so it's not actually a valid example after all. If the racey code didn't have well-defined behavior in the first place, then the fact that this transformation changes its behavior isn't any obstruction to the legality of the transformation.

    So while this isn't a proof, it's a strong suggestion that this transformation is legal. And indeed, it's a transformation that real-life compilers will actually do. Here is an example (godbolt), borrowed from my answer to Which types on a 64-bit computer are naturally atomic in gnu C and gnu C++? -- meaning they have atomic reads, and atomic writes (the answer is emphatically none).