Search code examples
c++referencenulllanguage-lawyer

Is a null reference possible?


Is this piece of code valid (and defined behavior)?

int &nullReference = *(int*)0;

Both g++ and clang++ compile it without any warning, even when using -Wall, -Wextra, -std=c++98, -pedantic, -Weffc++...

Of course the reference is not actually null, since it cannot be accessed (it would mean dereferencing a null pointer), but we could check whether it's null or not by checking its address:

if( & nullReference == 0 ) // null reference

Solution

  • The answer depends on your view point:


    If you judge by the C++ standard, you cannot get a null reference because you get undefined behavior first. After that first incidence of undefined behavior, the standard allows anything to happen. So, if you write *(int*)0, you already have undefined behavior as you are, from a language standard point of view, dereferencing a null pointer. The rest of the program is irrelevant, once this expression is executed, you are out of the game.


    However, in practice, null references can easily be created from null pointers, and you won't notice until you actually try to access the value behind the null reference. Your example may be a bit too simple, as any good optimizing compiler will see the undefined behavior, and simply optimize away anything that depends on it (the null reference won't even be created, it will be optimized away).

    Yet, that optimizing away depends on the compiler to prove the undefined behavior, which may not be possible to do. Consider this simple function inside a file converter.cpp:

    int& toReference(int* pointer) {
        return *pointer;
    }
    

    When the compiler sees this function, it does not know whether the pointer is a null pointer or not. So it just generates code that turns any pointer into the corresponding reference. (Btw: This is a noop since pointers and references are the exact same beast in assembler.) Now, if you have another file user.cpp with the code

    #include "converter.h"
    
    void foo() {
        int& nullRef = toReference(nullptr);
        cout << nullRef;    //crash happens here
    }
    

    the compiler does not know that toReference() will dereference the passed pointer, and assume that it returns a valid reference, which will happen to be a null reference in practice. The call succeeds, but when you try to use the reference, the program crashes. Hopefully. The standard allows for anything to happen, including the appearance of pink elephants.

    You may ask why this is relevant, after all, the undefined behavior was already triggered inside toReference(). The answer is debugging: Null references may propagate and proliferate just as null pointers do. If you are not aware that null references can exist, and learn to avoid creating them, you may spend quite some time trying to figure out why your member function seems to crash when it's just trying to read a plain old int member (answer: the instance in the call of the member was a null reference, so this is a null pointer, and your member is computed to be located as address 8).


    So how about checking for null references? You gave the line

    if( & nullReference == 0 ) // null reference
    

    in your question. Well, that won't work: According to the standard, you have undefined behavior if you dereference a null pointer, and you cannot create a null reference without dereferencing a null pointer, so null references exist only inside the realm of undefined behavior. Since your compiler may assume that you are not triggering undefined behavior, it can assume that there is no such thing as a null reference (even though it will readily emit code that generates null references!). As such, it sees the if() condition, concludes that it cannot be true, and just throw away the entire if() statement. With the introduction of link time optimizations, it has become plain impossible to check for null references in a robust way.


    TL;DR:

    Null references are somewhat of a ghastly existence:

    Their existence seems impossible (= by the standard),
    but they exist (= by the generated machine code),
    but you cannot see them if they exist (= your attempts will be optimized away),
    but they may kill you unaware anyway (= your program crashes at weird points, or worse).
    Your only hope is that they don't exist (= write your program to not create them).

    I do hope that will not come to haunt you!