Search code examples
c++pointerslanguage-lawyerundefined-behavior

Are pointer variables just integers with some operators or are they "symbolic"?


EDIT: The original word choice was confusing. The term "symbolic" is much better than the original ("mystical").

In the discussion about my previous C++ question, I have been told that pointers are

This does not sound right! If nothing is symbolic and a pointer is its representation, then I can do the following. Can I?

#include <stdio.h>
#include <string.h>

int main() {
    int a[1] = { 0 }, *pa1 = &a[0] + 1, b = 1, *pb = &b;
    if (memcmp (&pa1, &pb, sizeof pa1) == 0) {
        printf ("pa1 == pb\n");
        *pa1 = 2;
    }
    else {
        printf ("pa1 != pb\n");
        pa1 = &a[0]; // ensure well defined behaviour in printf
    }
    printf ("b = %d *pa1 = %d\n", b, *pa1);
    return 0;
 }

This is a C and C++ question.

Testing with Compile and Execute C Online with GNU GCC v4.8.3: gcc -O2 -Wall gives

pa1 == pb                                                                                                                                                                                       
b = 1 *pa1 = 2    

Testing with Compile and Execute C++ Online with GNU GCC v4.8.3: g++ -O2 -Wall

pa1 == pb                                                                                                                                                                                       
b = 1 *pa1 = 2        

So the modification of b via (&a)[1] fails with GCC in C and C++.

Of course, I would like an answer based on standard quotes.

EDIT: To respond to criticism about UB on &a + 1, now a is an array of 1 element.

Related: Dereferencing an out of bound pointer that contains the address of an object (array of array)

Additional note: the term "mystical" was first used, I think, by Tony Delroy here. I was wrong to borrow it.


Solution

  • C was conceived as a language in which pointers and integers were very intimately related, with the exact relationship depending upon the target platform. The relationship between pointers and integers made the language very suitable for purposes of low-level or systems programming. For purposes of discussion below, I'll thus call this language "Low-Level C" [LLC].

    The C Standards Committee wrote up a description of a different language, where such a relationship is not expressly forbidden, but is not acknowledged in any useful fashion, even when an implementation generates code for a target and application field where such a relationship would be useful. I'll call this language "High Level Only C" [HLOC].

    In the days when the Standard was written, most things that called themselves C implementations processed a dialect of LLC. Most useful compilers process a dialect which defines useful semantics in more cases than HLOC, but not as many as LLC. Whether pointers behave more like integers or more like abstract mystical entities depends upon which exact dialect one is using. If one is doing systems programming, it is reasonable to view C as treating pointers and integers as intimately related, because LLC dialects suitable for that purpose do so, and HLOC dialects that don't do so aren't suitable for that purpose. When doing high-end number crunching, however, one would far more often being using dialects of HLOC which do not recognize such a relationship.

    The real problem, and source of so much contention, lies in the fact that LLC and HLOC are increasingly divergent, and yet are both referred to by the name C.