Search code examples
c++language-lawyerc++20undefined-behavior

Is dereference null pointer UB in C++20?


I was starting from researching the question "is &((T*)NULL)->member UB in C?". This is an example in my textbook, which introduced the old implementation of offsetof.

I know that offsetof can't be implemented in C++ now(by cppreference page).
But after reading some C++ CWS issues, my problem is kind of becomes "is dereference null pointer UB?".
Also, I think they won't changed the implementation of offsetof from &((T*)NULL)->member in C withought any reason, but I don't know why, maybe because it's UB? But I didn't find a term said &((T*)NULL)->member is UB in C. For C++, I think it's UB if it's not standard layout type.

At the begining, I thought there would be a term explicitly specified sth like "dereference NULL pointer is UB"
However, as I get in deeper, I found that it's more complicated than I thought.
After reading a lot of stackoverflow article reply, I found that the answer is not unified.
Some posts said it's well-defined, some posts said it's UB, some posts said it non-specified.

For those posts said it's well-defined, they quote "CWG issue #232" and "CWG issue #315" as the reasons, like the answer in c++ access static members using null pointer.
For those posts said it's non-specified, they said it didn't be explicitly specified in standard.
For those posts said it's UB, they said the issue have not be included in standard, so it's still UB. Also, they give the term about "If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.".

The example in the stackoverflow above is:

#include <iostream>
class demo {
public:
  static void fun()
  {
    std::cout << "fun() is called\n";
  }
  static int a;
};

int demo::a = 9;

int main()
{
  demo *d = nullptr;
  d->fun();
  std::cout << d->a;
  return 0;
}

Their roughly reason for saying it's well-defined was:

  1. E1->E2 equivalent to (*(E1)).E2
  2. thus, if *d; is legal, then d->fun() is legal.
  3. CWG issue #232 said p = 0; *p; is not inherently an error. An lvalue-to-rvalue conversion would give it undefined behavior.
  4. CWG issue #315 said *d in the above example is not an error when d is null unless the lvalue is converted to an rvalue (7.3.2 [conv.lval]), which it isn't here.
  5. thus *d; is legal, then d->fun() is legal.

the issue was discussed around 2005 years, which still in C++03 spec.
However, in C++20, for ->, the standard explicitly specified the E1 in E1->E2 should be prvalue:

n4861(expr.ref#2): For the second option (arrow) the first expression shall be a prvalue having pointer type.

so I think there may be an lvalue-to-rvalue conversion here since E1 shall be prvalue?

Btw, standard used "dereference null pointer" as an example for undefined behavior before

n1146(intro.execution#4): Certain other operations are described in this International Standard as undefined (for example, the effect of dereferencing the null pointer).

But the example was changed in CWG issue #1102. The reason they said was

There are core issues surrounding the undefined behavior of dereferencing a null pointer. It appears the intent is that dereferencing is well defined, but using the result of the dereference will yield undefined behavior. This topic is too confused to be the reference example of undefined behavior, or should be stated more precisely if it is to be retained.

The issue was discussed in 2010, which have been 13 years ago, so I think it have been a problem for a long time, but sadly, I still can't find the answer now.

All in all, can a language lawyer give me an conclusion about this problem? Is dereference null pointer UB in C++20? For example, &((T*)NULL)->member and the d->fun() above. Or it's IB or unspecified behavior?

Hopefully, the history and the term in standard can be provided.

Edit:
My summary is that this is still an unresolved issue, for now, it is always UB by omission in expr.unary#op-1.sentence-3 which only defines the behavior if there is an object to which the pointer points. But that's probably not the intended specification.

Btw, There is a more recent discussion of this topic with the same outcome: https://github.com/cplusplus/CWG/issues/198

plz check the comment of @user17732522 and the answer by @Brian Bi


Solution

  • As of now, the issue of whether dereferencing a null pointer is UB is still unresolved. And it is not clear whether the direction indicated in CWG 232, i.e. that it should be UB only if an attempt is made to access the value through the result of the dereference, is still the consensus of CWG (although there is at least one situation where it's explicitly legal, namely when the resulting lvalue is of polymorphic type and is the operand of typeid). And if CWG were to agree on a direction, then it is not clear whether EWG would accept that direction. So, really, no one knows the answer.

    There is at least one good reason why &((T*)NULL)->member should be UB. An implementation presumably computes &E->m by adding a fixed offset to the value of E. If E is a null pointer, this arithmetic will generate an address value that may be recognized by the hardware as not being valid, resulting in a trap on some implementations on which loading an invalid pointer value into a register causes a trap. I would imagine that an eventual resolution of CWG 232, if one were to actually occur, would clarify that this situation is UB.