Search code examples
cpointerscomparison

Why does comparing pointers with undefined behavior still give correct results?


I am trying to understand pointer comparison operators in c programs.

ISO/IEC 9899:2011 specifies that comparing pointers (using > or <) pointing to different objects is undefined behavior.

However, playing around I figured that when "irrelevant" pointers are compared, they seem to be treated as just "numbers that happen to represent a location in memory", by all tested compilers/interpreters.
Is this always the case? If so, why isn't this part of the standard?

To put this differently, can there be an edge case where pointer p points to virtual memory address of let's say 0xffff, pointer b to 0x0000, yet (p < b) returns true?


Solution

  • Is this always the case? If so, why isn't this part of the standard?

    Most of the time, but not necessarily. There's various oddball architectures with segmented memory areas. The C standard also wants to allow pointers to be some abstract items, that are not necessarily equivalent to physical addresses.

    Also, in theory if you have something like this

    int a;
    int b;
    int* pa = &a;
    int* pb = &b;
    
    if (pa < pb) // undefined behavior
        puts("less"); 
    else 
        puts("more");
    

    Then the compiler could in theory replace the whole if-else with puts("more"), even if the address of pa is lower than the address of pb. Because it is free to deduct that pa and pb cannot be compared, or that comparing them always gives false. This is the danger of undefined behavior - what code the compiler generates is anyone's guess.

    In practice, the undefined behavior in the above snippet seems to lead to less efficient code, on -O3 with gcc and clang x86. It compiles into two loads of the addresses and then a run-time comparison. Even though the compiler should be able to calculate all addresses at compile time.

    When changing the code to well-defined behavior:

    int a[2];
    int* pa = &a[0];
    int* pb = &a[1];
    

    Then I get much better machine code - the comparison is now calculated at compile time and the whole program is replaced by a simple call to puts("less").

    On embedded systems compilers however, you are almost certainly able to access any address as if it was an integer - as a well-defined non-standard extension. Otherwise it would be impossible to code things like flash drivers, bootloaders, CRC memory checks etc.