Why is it slower to compare a nullable value type to null on a generic method with no constraints?

I came across a very funny situation where comparing a nullable type to null inside a generic method is 234x slower than comparing an value type or a reference type. The code is as follows:

static bool IsNull<T>(T instance)
{
    return instance == null;
}

The execution code is:

int? a = 0;
string b = "A";
int c = 0;

var watch = Stopwatch.StartNew();

for (int i = 0; i < 1000000; i++)
{
    var r1 = IsNull(a);
}

Console.WriteLine(watch.Elapsed.ToString());

watch.Restart();

for (int i = 0; i < 1000000; i++)
{
    var r2 = IsNull(b);
}

Console.WriteLine(watch.Elapsed.ToString());

watch.Restart();

for (int i = 0; i < 1000000; i++)
{
    var r3 = IsNull(c);
}

watch.Stop();

Console.WriteLine(watch.Elapsed.ToString());
Console.ReadKey();

The output for the code above is:

00:00:00.1879827

00:00:00.0008779

00:00:00.0008532

As you can see, comparing an nullable int to null is 234x slower than comparing an int or a string. If I add a second overload with the right constraints, the results change dramatically:

static bool IsNull<T>(T? instance) where T : struct
{
    return instance == null;
}

Now the results are:

00:00:00.0006040

00:00:00.0006017

00:00:00.0006014

Why is that? I didn't check the byte code because I'm not fluent on it, but even if the byte code was a little bit different, I would expect the JIT to optimize this, and it is not (I'm running with optimizations).

Solution

If you compare the IL produced by the two overloads, you can see that there is boxing involved:

The first looks like:

.method private hidebysig static bool IsNull<T>(!!T instance) cil managed
{
    .maxstack 2
    .locals init (
        [0] bool CS$1$0000)
    L_0000: nop 
    L_0001: ldarg.0 
    L_0002: box !!T
    L_0007: ldnull 
    L_0008: ceq 
    L_000a: stloc.0 
    L_000b: br.s L_000d
    L_000d: ldloc.0 
    L_000e: ret 
}

While the second looks like:

.method private hidebysig static bool IsNull<valuetype ([mscorlib]System.ValueType) .ctor T>(valuetype [mscorlib]System.Nullable`1<!!T> instance) cil managed
{
    .maxstack 2
    .locals init (
        [0] bool CS$1$0000)
    L_0000: nop 
    L_0001: ldarga.s instance
    L_0003: call instance bool [mscorlib]System.Nullable`1<!!T>::get_HasValue()
    L_0008: ldc.i4.0 
    L_0009: ceq 
    L_000b: stloc.0 
    L_000c: br.s L_000e
    L_000e: ldloc.0 
    L_000f: ret 
}

In the second case, the compiler knows the type is a Nullable so it can optimize for that. In the first case, it has to handle any type, both reference and value types. So it has to jump through some extra hoops.

As for why int is faster than int?, I'd imagine there are some JIT optimizations involved there.