Search code examples
.netclril

Calling instance method on a Null Reference sometimes successful


Sorry for the wall of text, but I wanted to give a good background on the situation. I know you can call methods on null references in IL, but still don't understand a few very strange things that happen when you do it, in regards to my understanding of how the CLR works. The few other questions I've found here regarding this didn't cover the behavior I'm seeing here.

Here is some IL:

.assembly MrSandbox {}
.class private MrSandbox.AClass {
    .field private int32 myField

    .method public int32 GetAnInt() cil managed {
        .maxstack  1
        .locals init ([0] int32 retval)
        ldc.i4.3
        stloc retval
        ldloc retval
        ret
    }

    .method public int32 GetAnotherInt() cil managed {
        .maxstack  1
        .locals init ([0] int32 retval)
        ldarg.0
        ldfld int32 MrSandbox.AClass::myField
        stloc retval
        ldloc retval
        ret
    }
}
.class private MrSandbox.Program {
    .method private static void Main(string[] args) cil managed {
        .entrypoint
        .maxstack  1
        .locals init ([0] class MrSandbox.AClass p,
                      [1] int32 myInt)
        ldnull
        stloc p
        ldloc p
        call instance int32 MrSandbox.AClass::GetAnotherInt()
        stloc myInt
        ldloc myInt
        call void [mscorlib]System.Console::WriteLine(int32)
        ret
    }
}

Now, when this code runs, we get what I expect to happen, kind of. callvirt will check for null, where call doesn't, however, here on the call a NullReferenceException is thrown. This isn't clear to me, as I would expect a System.AccessViolationException instead. I'll explain my reasoning at the end of this question.

If we replace the code inside Main(string[] args) with this (after the .locals lines):

        ldnull
        stloc p
        ldloc p
        call instance int32 MrSandbox.AClass::GetAnInt()
        stloc myInt
        ldloc myInt
        call void [mscorlib]System.Console::WriteLine(int32)
        ret

This one, to my surprise, runs, and prints 3 to the console, exiting successfully. I am calling a function on a null reference, and it's executing properly. My guess is that it has something to do with the fact that no instance fields are being called, so the CLR can successfully execute the code.

Finally, and this is where the real confusion sets in for me, replace the code in Main(string[] args) with this (after the .locals lines):

        ldnull
        stloc p
        ldloc p
        call instance int32 MrSandbox.AClass::GetAnInt()
        stloc myInt
        ldloc myInt
        call void [mscorlib]System.Console::WriteLine(int32)
        call valuetype [mscorlib]System.ConsoleKeyInfo [mscorlib]System.Console::ReadKey()
        pop
        call instance int32 MrSandbox.AClass::GetAnotherInt()
        stloc myInt
        ldloc myInt
        call void [mscorlib]System.Console::WriteLine(int32)
        ret

Now, what would you expect this code to do? I expected the code to write 3 out to the console, read a key from the console, and then fail on a NullReferenceException. Well, none of that happens. Instead, no values are printed to the screen, except for a System.AccessViolationException. Why is it inconsistent?

With the background out of the way, here are my questions:

1) MSDN lists that callvirt will throw a NullReferenceException if obj is null, but call just says that it must not be null. Why then, is it throwing an NRE by default instead of an access violation? It seems to me that call by contract would try and access the memory and fail, instead of doing what callvirt does by checking for null first.

2) Is the reason why the second example works due to the fact that it accesses no class level fields and that call doesn't do a null check? If so, how can a non-static method be invoked on a null reference and return successful? My understanding is that when a reference type is put on the stack, only the Type object it put on the heap. So is the method being called from the type object?

3) Why the difference in exceptions throw between the first and the last example? In my opinion, the 3rd example throws the correct exception, an AccessViolationException since that's exactly what it's trying to do; accessing unallocated memory.


Before the "The behavior is undefined" answers roll in, I know that this is not AT ALL a proper way of writing things, I'm just hoping someone can help to shed some insight on the above questions.

Thanks.


Solution

  • 1) The processor is raising an access violation. The CLR traps the exception and translates it, based on the exception's access address. Any access within the first 64KB of the address space is re-raised as a managed NullReferenceException. Check this answer for reference.

    2) Yes, the CLR does not enforce a non-null this value. The C++/CLI compiler for example generates code that doesn't perform this check, much like native C++ does. As long as the method doesn't ever use the this reference this will not cause an exception. The C# compiler explicitly generates code to verify the value of this before the method call, callvirt. See this blog post for reference.

    3) You got the IL wrong, GetAnotherInt() is an instance method but you forgot to write the ldloc instruction. You get an AV because the reference pointer is random.