Search code examples
c#nullablecilboxingc#-7.0

CIL - Boxing/Unboxing vs Nullable


If I understand the way the CLR boxes things and treats nullables, as described at Boxing / Unboxing Nullable Types - Why this implementation?, there is still something that confuses me. For example, the following C# 7 code

void C<T>(object o) where T : struct {
    if (o is T t)
        Console.WriteLine($"Argument is {typeof(T)}: {t}");
}

compiles into the following CIL

IL_0000: ldarg.0
IL_0001: isinst valuetype [mscorlib]System.Nullable`1<!!T>
IL_0006: unbox.any valuetype [mscorlib]System.Nullable`1<!!T>
IL_000b: stloc.1
IL_000c: ldloca.s 1
IL_000e: call instance !0 valuetype [mscorlib]System.Nullable`1<!!T>::GetValueOrDefault()
IL_0013: stloc.0
IL_0014: ldloca.s 1
IL_0016: call instance bool valuetype [mscorlib]System.Nullable`1<!!T>::get_HasValue()
IL_001b: brfalse.s IL_003c

IL_001d: ldstr "Argument is {0}: {1}"
IL_0022: ldtoken !!T
IL_0027: call class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle(valuetype [mscorlib]System.RuntimeTypeHandle)
IL_002c: ldloc.0
IL_002d: box !!T
IL_0032: call string [mscorlib]System.String::Format(string, object, object)
IL_0037: call void [mscorlib]System.Console::WriteLine(string)

IL_003c: ret

yet the following C#

void D<T>(object o) where T : struct {
    if (o is T)
        Console.WriteLine($"Argument is {typeof(T)}: {(T) o}");
}

compiles into the following CIL

IL_0000: ldarg.0
IL_0001: isinst !!T
IL_0006: brfalse.s IL_002c

IL_0008: ldstr "Argument is {0}: {1}"
IL_000d: ldtoken !!T
IL_0012: call class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle(valuetype [mscorlib]System.RuntimeTypeHandle)
IL_0017: ldarg.0
IL_0018: unbox.any !!T
IL_001d: box !!T
IL_0022: call string [mscorlib]System.String::Format(string, object, object)
IL_0027: call void [mscorlib]System.Console::WriteLine(string)

IL_002c: ret

What I think is happening: Looking at the CIL of the first method, it seems to (1) check if the argument is a [boxed?] Nullable<T>, pushing it on the stack if it is, and null otherwise, (2) unboxes it (what if it's null?), (3) tries to get its value, and default(T) otherwise, (4) and then check if it has a value or not, branching out if it doesn't. The CIL of the second method is straightforward enough, which simply tries to unbox the argument.

If the semantics of both of the code are equivalent, why does the former case involve unboxing to a Nullable<T> whereas the former case "just unboxes"? Secondly, in the first CIL, if the object argument were to be a boxed int, which I currently believe to be exactly what it says on the tin (i.e. a boxed int rather than a boxed Nullable<int>), wouldn't the isinst instruction always fail? Does Nullable<T> get special treatment even on the CIL level?

Update: After handwriting some MSIL, it seems that object, if it is indeed a boxed int, can be unboxed into either an int or a Nullable<int>.

.method private static void Foo(object o) cil managed {
    .maxstack 1
    ldarg.0
    isinst int32
    brfalse.s L_00
    ldarg.0
    unbox.any int32
    call void [mscorlib]System.Console::WriteLine(int32)
L_00:
    ldarg.0
    isinst valuetype [mscorlib]System.Nullable`1<int32>
    brfalse.s L_01
    ldarg.0
    unbox valuetype [mscorlib]System.Nullable`1<int32>
    call instance !0 valuetype [mscorlib]System.Nullable`1<int32>::GetValueOrDefault()
    call void [mscorlib]System.Console::WriteLine(int32)
L_01:
    ldarg.0
    unbox valuetype [mscorlib]System.Nullable`1<int32>
    call instance bool valuetype [mscorlib]System.Nullable`1<int32>::get_HasValue()
    brtrue.s L_02
    ldstr "No value!"
    call void [mscorlib]System.Console::WriteLine(string)
L_02:
    ret
}

Solution

  • The new syntax in C# 7 is doing type checking and type conversion at once. In older versions, this was usually done in two possible ways.

    if(o is T)
        //use (T)o
    

     

    T t = o as T;
    if(t != null)
        //use t
    

    For reference types, the first one has a redundant conversion, because is is compiled to isinst and a conditional branch, as you can see from your CIL instructions used. The second code is identical to the first in terms of CIL, minus the additional (T)o cast (compiled to castclass).

    For value types, the second options can only be done with a nullable type, and I also think it is actually somewhat slower than the first one (a structure has to be created).

    I have compiled the following method to CIL:

    static void C<T>(object o) where T : struct
    {
        T? t = o as T?;
        if(t != null)
            Console.WriteLine("Argument is {0}: {1}", typeof(T), t);
    }
    

    Producing this code:

    .method private hidebysig static void  C<valuetype .ctor ([mscorlib]System.ValueType) T>(object o) cil managed
    {
      // Code size       48 (0x30)
      .maxstack  3
      .locals init (valuetype [mscorlib]System.Nullable`1<!!T> V_0)
      IL_0000:  ldarg.0
      IL_0001:  isinst     valuetype [mscorlib]System.Nullable`1<!!T>
      IL_0006:  unbox.any  valuetype [mscorlib]System.Nullable`1<!!T>
      IL_000b:  stloc.0
      IL_000c:  ldloca.s   V_0
      IL_000e:  call       instance bool valuetype [mscorlib]System.Nullable`1<!!T>::get_HasValue()
      IL_0013:  brfalse.s  IL_002f
      IL_0015:  ldstr      "Argument is {0}: {1}"
      IL_001a:  ldtoken    !!T
      IL_001f:  call       class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle(valuetype [mscorlib]System.RuntimeTypeHandle)
      IL_0024:  ldloc.0
      IL_0025:  box        valuetype [mscorlib]System.Nullable`1<!!T>
      IL_002a:  call       void [mscorlib]System.Console::WriteLine(string,
                                                                    object,
                                                                    object)
      IL_002f:  ret
    }
    

    This is exactly the code as in the question, except the call to GetValueOrDefault, because I don't obtain the actual value of the nullable instance.

    Nullable types cannot be boxed or unboxed directly, only via their underlying value, or as a normal null. The first isinst ensures that other types won't produce an exception (I suppose isinst !!T could also be used), only a null reference instead. The unbox.any opcode then forms a nullable instance from the reference, which is then used as usual. The instruction can also be written as a null check and forming the nullable instance on its own, but it's shorter this way.

    The C# 7 uses the second way for is T t, hence it has no other choice than using the nullable type, if T is a value type. Why does it not choose the former option? I can only guess that it can have some substantial differences in terms of semantics or implementation, variable allocation etc. Therefore, they choose to be consistent with the implementation of the new construct.

    For comparison, here's what is produced when I change T : struct to T : class in the method above (and T? to T):

    .method private hidebysig static void  C<class T>(object o) cil managed
    {
      // Code size       47 (0x2f)
      .maxstack  3
      .locals init (!!T V_0)
      IL_0000:  ldarg.0
      IL_0001:  isinst     !!T
      IL_0006:  unbox.any  !!T
      IL_000b:  stloc.0
      IL_000c:  ldloc.0
      IL_000d:  box        !!T
      IL_0012:  brfalse.s  IL_002e
      IL_0014:  ldstr      "Argument is {0}: {1}"
      IL_0019:  ldtoken    !!T
      IL_001e:  call       class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle(valuetype [mscorlib]System.RuntimeTypeHandle)
      IL_0023:  ldloc.0
      IL_0024:  box        !!T
      IL_0029:  call       void [mscorlib]System.Console::WriteLine(string,
                                                                    object,
                                                                    object)
      IL_002e:  ret
    }
    

    Again fairly consistent with the original method.