I'm creating an open-instance delegate from a DynamicMethod
to call a method on a certain target. The code handles by ref parameters, as well as static methods.
See the following:
public class Test
{
public void ByRef(ref int x, int y, out int z) { x = y = z = -1; }
}
var type = typeof(Test);
var method = type.GetMethod("ByRef");
var caller = method.DelegateForCall();
var args = new object [] { 1, 2, 3 };
var inst = new Test();
caller(inst, args);
Console.WriteLine(args[0]); // -1
Console.WriteLine(args[1]); // 2
Console.WriteLine(args[2]); // -1
DelegateForCall
returns an open-instance delegate to call ByRef
method on a Test
object given some arguments. So one could deduce its definition:
public delegate object MethodCaller(object target, object[] args);
But it's actually strongly typed (I deal with both strong and weak targets) So it actually looks like this:
public delegate TReturn MethodCaller<TTarget, TReturn>(TTarget target, object[] args);
Code works as expected. I will show you the code I'm using to generate the caller delegate, but first let me show what I'm expecting it to generate. DelegateForCall
basically returns DelegateForCall<object, object>
so it's weakly typed, in that case I expect it to generate the following:
public static object MethodCaller(object target, object[] args)
{
Test tmp = (Test)target;
int arg0 = (int)args[0];
int arg1 = (int)args[1];
int arg2 = (int)args[2];
tmp.ByRef(ref arg0, arg1, out arg2);
args[0] = arg0;
args[2] = arg2;
return null;
}
Unfortunately, viewing the generated code in the test assembly I generate (for debugging purposes) in ILSpy, shows this C# code:
public static object MethodCaller(object target, object[] args)
{
Program.Test test = (Program.Test)target;
Program.Test arg_39_0 = test;
int num = (int)args[0];
int num2 = (int)args[1];
int arg_39_2 = num2;
int num3 = (int)args[2];
arg_39_0.ByRef(ref num, arg_39_2, ref num3);
args[0] = num;
args[2] = num3;
return null;
}
I'm unable to understand why it declared arg_39_0
and arg_39_2
- In my code, I declare a local to store the target, and locals to get the values from the args
array. So in total, we should see 4 locals.
Here's the code I'm using:
static void GenerateMethodInvocation<TTarget>(MethodInfo method)
{
var weaklyTyped = typeof(TTarget) == typeof(object);
// push target if not static (instance-method. in that case first arg0 is always 'this')
if (!method.IsStatic)
{
var targetType = weaklyTyped ? method.DeclaringType : typeof(TTarget);
emit.declocal(targetType);
emit.ldarg0();
if (weaklyTyped)
emit.unbox_any(targetType);
emit.stloc0()
.ifclass_ldloc_else_ldloca(0, targetType);
}
// push arguments in order to call method
var prams = method.GetParameters();
for (int i = 0, imax = prams.Length; i < imax; i++)
{
emit.ldarg1() // push array
.ldc_i4(i) // push index
.ldelem_ref(); // pop array, index and push array[index]
var param = prams[i];
var dataType = param.ParameterType;
if (dataType.IsByRef)
dataType = dataType.GetElementType();
var tmp = emit.declocal(dataType);
emit.unbox_any(dataType)
.stloc(tmp)
.ifbyref_ldloca_else_ldloc(tmp, param.ParameterType);
}
// perform the correct call (pushes the result)
emit.callorvirt(method);
// assign byref values back to the args array
// if method wasn't static that means we declared a temp local to load the target
// that means our local variables index for the arguments start from 1
int localVarStart = method.IsStatic ? 0 : 1;
for (int i = 0; i < prams.Length; i++)
{
var paramType = prams[i].ParameterType;
if (paramType.IsByRef)
{
var byRefType = paramType.GetElementType();
emit.ldarg1()
.ldc_i4(i)
.ldloc(i + localVarStart);
if (byRefType.IsValueType)
emit.box(byRefType);
emit.stelem_ref();
}
}
if (method.ReturnType == typeof(void))
emit.ldnull();
else if (weaklyTyped)
emit.ifvaluetype_box(method.ReturnType);
emit.ret();
}
'emit' is basically a helper I use to emit opcodes (source)
Finally, here's the IL code as shown in ILSpy which seems to be more consistent with the C# I expected, and not the C# it actually generated (the one with the two extra redundant local variables)
.method public hidebysig static
object MethodCaller (
object target,
object[] args
) cil managed
{
// Method begins at RVA 0x2050
// Code size 100 (0x64)
.maxstack 5
.locals init (
[0] class [CustomSerializer]CustomSerializer.Program/Test,
[1] int32,
[2] int32,
[3] int32
)
IL_0000: ldarg.0
IL_0001: unbox.any [CustomSerializer]CustomSerializer.Program/Test
IL_0006: stloc.0
IL_0007: ldloc 0
IL_000b: nop
IL_000c: nop
IL_000d: ldarg.1
IL_000e: ldc.i4 0
IL_0013: ldelem.ref
IL_0014: unbox.any [mscorlib]System.Int32
IL_0019: stloc.1
IL_001a: ldloca.s 1
IL_001c: ldarg.1
IL_001d: ldc.i4 1
IL_0022: ldelem.ref
IL_0023: unbox.any [mscorlib]System.Int32
IL_0028: stloc.2
IL_0029: ldloc.2
IL_002a: ldarg.1
IL_002b: ldc.i4 2
IL_0030: ldelem.ref
IL_0031: unbox.any [mscorlib]System.Int32
IL_0036: stloc.3
IL_0037: ldloca.s 3
IL_0039: call instance void [CustomSerializer]CustomSerializer.Program/Test::ByRef(int32&, int32, int32&)
IL_003e: ldarg.1
IL_003f: ldc.i4 0
IL_0044: ldloc 1
IL_0048: nop
IL_0049: nop
IL_004a: box [mscorlib]System.Int32
IL_004f: stelem.ref
IL_0050: ldarg.1
IL_0051: ldc.i4 2
IL_0056: ldloc 3
IL_005a: nop
IL_005b: nop
IL_005c: box [mscorlib]System.Int32
IL_0061: stelem.ref
IL_0062: ldnull
IL_0063: ret
} // end of method Test::MethodCaller
Note how it clearly states there's 4 local variables, but yet ILSpy C# shows 6!
Note the generated assembly passes peverify
verification.
Why is the C# in ILSpy doesn't look like what I had in mind? Why is it showing that there's 6 local variables while there's actually only 4?
Edit: Here's what dotPeek shows, all the more weird ...
public static object MethodCaller(object target, object[] args)
{
Program.Test test = (Program.Test) target;
int num1 = (int) args[0];
// ISSUE: explicit reference operation
// ISSUE: variable of a reference type
int& x = @num1;
int y = (int) args[1];
int num2 = (int) args[2];
// ISSUE: explicit reference operation
// ISSUE: variable of a reference type
int& z = @num2;
test.ByRef(x, y, z);
args[0] = (object) num1;
args[2] = (object) num2;
return (object) null;
}
The int& x = @num1;
statements, generate a reference to num1
. This is done to perform a method call with a ref
call.
If you call a method:
public void ByRef(ref int x, int y, out int z)
that means you are passing references to x
and z
. Now C# allows you to do this very neat at code level, but on the IL level, it's less obvious because there is only a limited instruction set. As a result, the ByRef
method is translated as:
public void ByRef(int& x, int y, int& z)
and you first need to calculate the references. Now a decompiler has always trouble understanding what is going on, especially if the code is optimized. Although for humans this might look as an easy pattern, for machines it is in general much harder.
Another reason why new variables are declared is that in general when one is generating a list of arguments, they are pushed on the call stack. So you do something like:
push arg0
push arg1
push arg2
call method
To do something as:
method(arg0,arg1,arg2)
Now you can sometimes make calculations interleaved. So you push something on the stack, then pop it to perform some operation, etc. It is hard to keep track of which variable is located where and whether it has still the same value as the original one. By using "new variables" in the decompiling process, you are sure you don't do anything wrong.
Short version:
You always must first generate a reference to the values. Since they are of a different type than int
(int
is not equal to int&
), the decompiler decided to use new variables. But decompiling is never perfect. There are an infinite amount of programs that can result in the same IL code.
Decompiler should be conservative: you start from IL code (or something equivalent), and try to make sense out of that code. It is however not easy to do that. A decompiler uses a set of "rules" that are executed repeatedly to get the code into a readable state. These "rules" are conservative: you must guarantee that the code after the rule is equivalent to the code before. To do that, you're better safe than sorry. Introducing additional variables to ensure that is sometimes a necessary precaution.