Search code examples
cc11compound-literalsc17

Does the C11/C17 Standard allow the compiler to clobber compound literals' memory?


I've found compound literals a very useful and elegant way to send initialized arrays and structs to functions without writing overly verbose code, but I want to understand the cost of writing code this way, for instance this simple nonsense program:

#include <stdio.h>

struct st{
  int a;
  int b;
};

int foo(struct st bar){
  return bar.a * bar.b;
}

int main(void){
  printf("%d %d\n", foo((struct st){.a=4, .b=6}), foo((struct st){.a=7, .b=19})); 
}

compiles without complaint and outputs 24 133 as expected.

I can modify the dummy code so that the function takes a pointer to the struct st instead of value:

#include <stdio.h>

struct st{
  int a;
  int b;
};

int foo(struct st *bar){
  return bar->a * bar->b;
}

int main(void){
  printf("%d %d\n", foo(&(struct st){.a=4, .b=6}), foo(&(struct st){.a=7, .b=19})); 
}

The compiler doesn't mind and the output doesn't change. Obviously what I'm asking the compiler to do is different between these cases. In the first example the compiler can notice that the compound literals are being sent by value to the function, so after the calls to foo(), the compound literals sent by value are fully unreachable by any future code in main(). It seems to me that the compiler could theoretically notice this and clobber the same memory space to satisfy the request for both literals, perhaps reanalyzing the main function like this:

int main(void){
  struct st a = {4,6};
  int ret1 = foo(a);
  a.a = 7;
  a.b = 19;
  int ret2 = foo(a);
  printf("%d %d\n",ret1, ret2);
}

but in the second version of the code, the literals are being sent by reference, so the compiler likely wouldn't be able to easily infer that their memory is clobberable, perhaps interpreting the code like this:

int main(void){
  struct st a = {4,6};
  struct st b = {7,19};
  int ret1 = foo(&a);
  int ret2 = foo(&b);
  printf("%d %d\n",ret1, ret2);
}

which in my understanding forces the compiler to allocate more stack memory to complete the same task.

Is this an accurate assessment of what is happening? Does the C standard allow the compiler to reuse memory that is technically still in scope?

If not, does that mean that code like:

for(int c = 0; c < SOME_RUNTIME_VALUE; ++c)
    printf("%d\n", foo((struct st){.a = c/4, .b = c%4}));

forces the compiler to allocate an unknown amount of memory onto the stack? Does it make a difference from a stack-memory aspect if they are sent by value or reference at all?


Solution

  • Does the C standard allow the compiler to reuse memory that is technically still in scope?

    Memory is not in scope. Scope is for identifiers. Objects have lifetime.1

    A compound literal declared inside a function has automatic lifetime associated with its enclosing block. So, even if a compiler cannot see the definition of the function foo, it knows the compound literals exist (in the abstract model of the C standard) only until main ends. It can analyze main and see the objects are not used anywhere else, so it is permitted to optimize their space—it could reuse their space as soon as the program is done with them—the same as it could for named objects (variables).

    However, if you had calls to other functions after foo, and the compiler did not have the definitions of those functions, it could not know those functions did not use the objects of the compound literals, since, as you note, foo could have stored their addresses. So the compiler would have to retain those objects at least until either the last call to a function it does not have the definition of or their abstract lifetime ends, whichever is earlier.

    Analysis for the compound literals inside the for loop is the same, using the inner block of the for loop instead of the block defining main.

    Footnote

    1 Scope is where in program source code identifiers are visible. Lifetime is when during program execution objects exist. There is some association between scope and lifetime because some lifetimes are determined by a relationship between program control (the execution point) and source code, but they are different things. For example, an object may exist even when program control is outside its scope—your foo accesses the compound literals passed by reference even though their definitions are not within its scope.