Search code examples
cgccunsignedsegmentation-fault

Why is the compiler confused into a SIGSEGV by unsigned int?


I have distilled a problem I had which made me figure out what is happening, but still not exactly why.

int main() {
    unsigned int a = 2;
    char c[2] = {};
    char* p = &c[1];
    return p[1 - a];
}

It is a bit clearer when the last line is rewritten.

    return *(p + (1 - a));      /* equivalent */
    return *(p + 1 - a);        /* works */
    return *(p + (1 - (int)a)); /* works */

I'm surprised that the compiler doesn't remove the parenthesis internally. And more so that it apparently tries to hold a temporary negative result of type unsigned int. Unless that's not the reason for segmentation fault here. In the assembler output there is only little difference between code with and without parenthesis.

-   movl    $1, %eax
-   subl    -12(%rbp), %eax
-   movl    %eax, %edx
+   movl    -12(%rbp), %eax
+   movl    $1, %edx
+   subq    %rax, %rdx

Solution

  • This is all about the C coercion rules. The expression 1-a is treated as an unsigned int, and results in an underflow. The compiler cannot remove the parentheses because you're mixing types. Consider your cases:

    return *(p + (1 - a));      /* equivalent */
    

    Calculates 1-a first, but treats it as an unsigned int. This underflows the unsigned type, and returns the maximum value for an unsigned int. This is then added to the pointer, resulting in a dereferencing a pointer to something like p+(1<<31), if unsigned int is 32-bit. This is not likely to be a valid memory location.

    return *(p + 1 - a);        /* works */
    

    This calculates p+1 and then subtracts a from it, resulting in dereferencing p-1. This is technically undefined behavior, but will probably (in most implementations) reference a valid memory location on the stack.

    return *(p + (1 - (int)a)); /* works */
    

    This coerces a to a signed int, and then calculates 1-a, which is -1. You then dereference p-1.