Supposing we have:
char* a;
int i;
Many introductions to C++ (like this one) suggest that the rvalues a+i
and &a[i]
are interchangeable. I naively believed this for several decades, until I recently stumbled upon the following text (here) quoted from [dcl.ref]:
in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the "object" obtained by dereferencing a null pointer, which causes undefined behavior.
In other words, "binding" a reference object to a null-dereference causes undefined behavior. Based on the context of the above text, one infers that merely evaluating &a[i]
(within the offsetof
macro) is considered "binding" a reference. Furthermore, there seems to be a consensus that &a[i]
causes undefined behavior in the case where a=null
and i=0
. This behavior is different from a+i
(at least in C++, in the a=null, i=0 case).
This leads to at least 2 questions about the differences between a+i
and &a[i]
:
First, what is the underlying semantic difference between a+i
and &a[i]
that causes this difference in behavior. Can it be explained in terms of any kind of general principles, not just "binding a reference to a null dereference object causes undefined behavior just because this is a very specific case that everybody knows"? Is it that &a[i]
might generate a memory access to a[i]
? Or the spec author wasn't happy with null dereferences that day? Or something else?
Second, besides the case where a=null
and i=0
, are there any other cases where a+i
and &a[i]
behave differently? (could be covered by the first question, depending on the answer to it.)
In the C++ standard, section [expr.sub]/1 you can read:
The expression
E1[E2]
is identical (by definition) to*((E1)+(E2))
.
This means that &a[i]
is exactly the same as &*(a+i)
. So you would dereference *
a pointer first and get the address &
second. In case the pointer is invalid (i.e. nullptr
, but also out of range), this is UB.
a+i
is based on pointer arithmetics. At first it looks less dangerous since there is no dereferencing that would be UB for sure. However, it may also be UB (see [expr.add]/4:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤ n; otherwise, the behavior is undefined. Likewise, the expression P - J points to the (possibly-hypothetical) element x[i − j] if 0 ≤ i − j ≤ n; otherwise, the behavior is undefined.
So, while the semantics behind these two expression are slightly different, I would say that the result is the same in the end.