Search code examples
c#reflection.emitililspy

Generating code for method call. Generated C# code shows more declared local variables then there actually is in the IL code?


I'm creating an open-instance delegate from a DynamicMethod to call a method on a certain target. The code handles by ref parameters, as well as static methods.

See the following:

public class Test
{
    public void ByRef(ref int x, int y, out int z) { x = y = z = -1; }
}

var type = typeof(Test);
var method = type.GetMethod("ByRef");
var caller = method.DelegateForCall();
var args = new object [] { 1, 2, 3 };
var inst = new Test();
caller(inst, args);
Console.WriteLine(args[0]); // -1
Console.WriteLine(args[1]); // 2
Console.WriteLine(args[2]); // -1

DelegateForCall returns an open-instance delegate to call ByRef method on a Test object given some arguments. So one could deduce its definition:

public delegate object MethodCaller(object target, object[] args);

But it's actually strongly typed (I deal with both strong and weak targets) So it actually looks like this:

public delegate TReturn MethodCaller<TTarget, TReturn>(TTarget target, object[] args);

Code works as expected. I will show you the code I'm using to generate the caller delegate, but first let me show what I'm expecting it to generate. DelegateForCall basically returns DelegateForCall<object, object> so it's weakly typed, in that case I expect it to generate the following:

public static object MethodCaller(object target, object[] args)
{
   Test tmp = (Test)target;
   int arg0 = (int)args[0];
   int arg1 = (int)args[1];
   int arg2 = (int)args[2];
   tmp.ByRef(ref arg0, arg1, out arg2);
   args[0] = arg0;
   args[2] = arg2;
   return null;
}

Unfortunately, viewing the generated code in the test assembly I generate (for debugging purposes) in ILSpy, shows this C# code:

public static object MethodCaller(object target, object[] args)
{
    Program.Test test = (Program.Test)target;
    Program.Test arg_39_0 = test;
    int num = (int)args[0];
    int num2 = (int)args[1];
    int arg_39_2 = num2;
    int num3 = (int)args[2];
    arg_39_0.ByRef(ref num, arg_39_2, ref num3);
    args[0] = num;
    args[2] = num3;
    return null;
}

I'm unable to understand why it declared arg_39_0 and arg_39_2 - In my code, I declare a local to store the target, and locals to get the values from the args array. So in total, we should see 4 locals.

Here's the code I'm using:

    static void GenerateMethodInvocation<TTarget>(MethodInfo method)
    {
        var weaklyTyped = typeof(TTarget) == typeof(object);

        // push target if not static (instance-method. in that case first arg0 is always 'this')
        if (!method.IsStatic)
        {
            var targetType = weaklyTyped ? method.DeclaringType : typeof(TTarget);
            emit.declocal(targetType);
            emit.ldarg0();
            if (weaklyTyped)
                emit.unbox_any(targetType);
            emit.stloc0()
                .ifclass_ldloc_else_ldloca(0, targetType);
        }

        // push arguments in order to call method
        var prams = method.GetParameters();
        for (int i = 0, imax = prams.Length; i < imax; i++)
        {
            emit.ldarg1()       // push array
                .ldc_i4(i)      // push index
                .ldelem_ref();  // pop array, index and push array[index]

            var param = prams[i];
            var dataType = param.ParameterType;

            if (dataType.IsByRef)
                dataType = dataType.GetElementType();

            var tmp = emit.declocal(dataType);
            emit.unbox_any(dataType)
                .stloc(tmp)
                .ifbyref_ldloca_else_ldloc(tmp, param.ParameterType);
        }

        // perform the correct call (pushes the result)
        emit.callorvirt(method);

        // assign byref values back to the args array
        // if method wasn't static that means we declared a temp local to load the target
        // that means our local variables index for the arguments start from 1
        int localVarStart = method.IsStatic ? 0 : 1;
        for (int i = 0; i < prams.Length; i++)
        {
            var paramType = prams[i].ParameterType;
            if (paramType.IsByRef)
            {
                var byRefType = paramType.GetElementType();
                emit.ldarg1()
                    .ldc_i4(i)
                    .ldloc(i + localVarStart);
                if (byRefType.IsValueType)
                    emit.box(byRefType);
                emit.stelem_ref();
            }
        }

        if (method.ReturnType == typeof(void))
            emit.ldnull();
        else if (weaklyTyped)
            emit.ifvaluetype_box(method.ReturnType);

        emit.ret();
    }

'emit' is basically a helper I use to emit opcodes (source)

Finally, here's the IL code as shown in ILSpy which seems to be more consistent with the C# I expected, and not the C# it actually generated (the one with the two extra redundant local variables)

.method public hidebysig static 
    object MethodCaller (
        object target,
        object[] args
    ) cil managed 
{
    // Method begins at RVA 0x2050
    // Code size 100 (0x64)
    .maxstack 5
    .locals init (
        [0] class [CustomSerializer]CustomSerializer.Program/Test,
        [1] int32,
        [2] int32,
        [3] int32
    )

    IL_0000: ldarg.0
    IL_0001: unbox.any [CustomSerializer]CustomSerializer.Program/Test
    IL_0006: stloc.0
    IL_0007: ldloc 0
    IL_000b: nop
    IL_000c: nop
    IL_000d: ldarg.1
    IL_000e: ldc.i4 0
    IL_0013: ldelem.ref
    IL_0014: unbox.any [mscorlib]System.Int32
    IL_0019: stloc.1
    IL_001a: ldloca.s 1
    IL_001c: ldarg.1
    IL_001d: ldc.i4 1
    IL_0022: ldelem.ref
    IL_0023: unbox.any [mscorlib]System.Int32
    IL_0028: stloc.2
    IL_0029: ldloc.2
    IL_002a: ldarg.1
    IL_002b: ldc.i4 2
    IL_0030: ldelem.ref
    IL_0031: unbox.any [mscorlib]System.Int32
    IL_0036: stloc.3
    IL_0037: ldloca.s 3
    IL_0039: call instance void [CustomSerializer]CustomSerializer.Program/Test::ByRef(int32&, int32, int32&)
    IL_003e: ldarg.1
    IL_003f: ldc.i4 0
    IL_0044: ldloc 1
    IL_0048: nop
    IL_0049: nop
    IL_004a: box [mscorlib]System.Int32
    IL_004f: stelem.ref
    IL_0050: ldarg.1
    IL_0051: ldc.i4 2
    IL_0056: ldloc 3
    IL_005a: nop
    IL_005b: nop
    IL_005c: box [mscorlib]System.Int32
    IL_0061: stelem.ref
    IL_0062: ldnull
    IL_0063: ret
} // end of method Test::MethodCaller

Note how it clearly states there's 4 local variables, but yet ILSpy C# shows 6!

Note the generated assembly passes peverify verification.

Why is the C# in ILSpy doesn't look like what I had in mind? Why is it showing that there's 6 local variables while there's actually only 4?

Edit: Here's what dotPeek shows, all the more weird ...

  public static object MethodCaller(object target, object[] args)
  {
    Program.Test test = (Program.Test) target;
    int num1 = (int) args[0];
    // ISSUE: explicit reference operation
    // ISSUE: variable of a reference type
    int& x = @num1;
    int y = (int) args[1];
    int num2 = (int) args[2];
    // ISSUE: explicit reference operation
    // ISSUE: variable of a reference type
    int& z = @num2;
    test.ByRef(x, y, z);
    args[0] = (object) num1;
    args[2] = (object) num2;
    return (object) null;
  }

Solution

  • The int& x = @num1; statements, generate a reference to num1. This is done to perform a method call with a ref call.

    If you call a method:

    public void ByRef(ref int x, int y, out int z)
    

    that means you are passing references to x and z. Now C# allows you to do this very neat at code level, but on the IL level, it's less obvious because there is only a limited instruction set. As a result, the ByRef method is translated as:

    public void ByRef(int& x, int y, int& z)
    

    and you first need to calculate the references. Now a decompiler has always trouble understanding what is going on, especially if the code is optimized. Although for humans this might look as an easy pattern, for machines it is in general much harder.


    Another reason why new variables are declared is that in general when one is generating a list of arguments, they are pushed on the call stack. So you do something like:

    push arg0
    push arg1
    push arg2
    call method
    

    To do something as:

    method(arg0,arg1,arg2)
    

    Now you can sometimes make calculations interleaved. So you push something on the stack, then pop it to perform some operation, etc. It is hard to keep track of which variable is located where and whether it has still the same value as the original one. By using "new variables" in the decompiling process, you are sure you don't do anything wrong.


    Short version:

    You always must first generate a reference to the values. Since they are of a different type than int (int is not equal to int&), the decompiler decided to use new variables. But decompiling is never perfect. There are an infinite amount of programs that can result in the same IL code.

    Decompiler should be conservative: you start from IL code (or something equivalent), and try to make sense out of that code. It is however not easy to do that. A decompiler uses a set of "rules" that are executed repeatedly to get the code into a readable state. These "rules" are conservative: you must guarantee that the code after the rule is equivalent to the code before. To do that, you're better safe than sorry. Introducing additional variables to ensure that is sometimes a necessary precaution.