I have distilled a problem I had which made me figure out what is happening, but still not exactly why.
int main() {
unsigned int a = 2;
char c[2] = {};
char* p = &c[1];
return p[1 - a];
}
It is a bit clearer when the last line is rewritten.
return *(p + (1 - a)); /* equivalent */
return *(p + 1 - a); /* works */
return *(p + (1 - (int)a)); /* works */
I'm surprised that the compiler doesn't remove the parenthesis internally. And more so that it apparently tries to hold a temporary negative result of type unsigned int
. Unless that's not the reason for segmentation fault here. In the assembler output there is only little difference between code with and without parenthesis.
- movl $1, %eax
- subl -12(%rbp), %eax
- movl %eax, %edx
+ movl -12(%rbp), %eax
+ movl $1, %edx
+ subq %rax, %rdx
This is all about the C coercion rules. The expression 1-a
is treated as an unsigned int
, and results in an underflow. The compiler cannot remove the parentheses because you're mixing types. Consider your cases:
return *(p + (1 - a)); /* equivalent */
Calculates 1-a
first, but treats it as an unsigned int
. This underflows the unsigned type, and returns the maximum value for an unsigned int
. This is then added to the pointer, resulting in a dereferencing a pointer to something like p+(1<<31)
, if unsigned int
is 32-bit. This is not likely to be a valid memory location.
return *(p + 1 - a); /* works */
This calculates p+1
and then subtracts a
from it, resulting in dereferencing p-1
. This is technically undefined behavior, but will probably (in most implementations) reference a valid memory location on the stack.
return *(p + (1 - (int)a)); /* works */
This coerces a
to a signed int
, and then calculates 1-a
, which is -1
. You then dereference p-1
.