I have this program in c#:
using System;
class Program
{
public static void Main()
{
int i = 4;
double d = 12.34;
double PI = Math.PI;
string name = "Ehsan";
}
}
and when i compile it, following is the IL generated by compiler for Main:
.method public hidebysig static void Main() cil managed
{
.entrypoint
// Code size 30 (0x1e)
.maxstack 1
.locals init (int32 V_0,
float64 V_1,
float64 V_2,
string V_3)
IL_0000: nop
IL_0001: ldc.i4.4
IL_0002: stloc.0
IL_0003: ldc.r8 12.34
IL_000c: stloc.1
IL_000d: ldc.r8 3.1415926535897931
IL_0016: stloc.2
IL_0017: ldstr "Ehsan"
IL_001c: stloc.3
IL_001d: ret
} // end of method Program::Main
which is fine and i understand it, now if i add another integer variable then something different is generated, here is the modified c# code:
using System;
class Program
{
public static void Main()
{
int unassigned;
int i = 4;
unassigned = i;
double d = 12.34;
double PI = Math.PI;
string name = "Ehsan";
}
}
and here is the IL generated against the above c# code:
.method public hidebysig static void Main() cil managed
{
.entrypoint
// Code size 33 (0x21)
.maxstack 1
.locals init (int32 V_0,
int32 V_1,
float64 V_2,
float64 V_3,
string V_4)
IL_0000: nop
IL_0001: ldc.i4.4
IL_0002: stloc.1
IL_0003: ldloc.1
IL_0004: stloc.0
IL_0005: ldc.r8 12.34
IL_000e: stloc.2
IL_000f: ldc.r8 3.1415926535897931
IL_0018: stloc.3
IL_0019: ldstr "Ehsan"
IL_001e: stloc.s V_4 // what is happening here in this case
IL_0020: ret
} // end of method Program::Main
If you note now the stloc.s
statement is generated with V_4
which is local but i am not clear about this and i am also not getting what is the purpose of these locals here, i mean these:
.locals init (int32 V_0,
float64 V_1,
float64 V_2,
string V_3)
Some things to note.
First, this is presumably a debug build, or at least has certain optimisations turned off in the compilation. What I would expect to see here is:
.method public hidebysig static void Main () cil managed
{
.entrypoint
IL_0000: ret
}
Which is to say, since those locals aren't used, I'd expect the compiler to just skip them entirely. It won't on a debug build, but this stands as a good example of how there can be considerable difference between what the C# says and what the IL says.
The next thing to note is how an IL method is structured. You have an array of local values, which is defined with the .locals
block, of various types. These will generally correspond pretty closely to what the C# had, though there'll often be short-cuts and re-arrangements made.
Finally we have the set of instructions which all act upon those locals, any arguments, and a stack to which it can push, from which it can pop, and upon which various instructions will interact.
The next thing to note is that the IL you see here is a sort of assembly for byte-code: Every instruction here has a one-to-one mapping to one or two bytes, and every value also consumes a certain number of bytes. So for example, stloc V_4
(not actually present in your examples, but we'll come to that) would map to 0xFE 0x0E 0x04 0x00
where 0xFE 0x0E
is the encoding of stloc
and 0x04 0x00
that of 4
which is the index of the local in question. It means "pop the value of the top of the stack, and store it in the 5th (index 4) local".
Now, there are a few abbreviations here. One of these is the .s
"short" form of several instructions (_S
in the name of the equivalent System.Reflection.Emit.OpCode
value). These are variants of other instructions that take a one-byte value (signed or unsigned depending on the instruction) where the other form takes a two- or four-byte value, generally indices or relative distances to jump. So instead of stloc V_4
we can have stloc.s V_4
which is only 0x13 0x4
, and so is smaller.
Then there are some variants that include a particular value in the instruction. So instead of either stloc V_0
or stloc.s V_0
we can just use stloc.0
which is just the single byte 0x0A
.
This makes a lot of sense when you consider that it's common to only have a handful of locals in use at a time, so using either stloc.s
or (better yet) the likes of stloc.0
, stloc.1
, etc.) gives tiny savings that add up to quite a lot.
But only so much. If we had e.g stloc.252
, stloc.253
etc. then there'd be a lot of such instructions, and the number of bytes needed for each instruction would have to be more, and it would overall be a loss. The super-short forms of the local-related (stloc
, ldloc
) and argument-related (ldarg
) only go up to 3
. (There is a starg
and starg.s
but no starg.0
etc. as storing to arguments is relatively rare). ldc.i4
/ldc.i4.s
(push a constant 32-bit signed value onto the stack) has super-short versions going from ldc.i4.0
to ldc.i4.8
and also lcd.i4.m1
for -1
.
It's also worth noting that the V_4
doesn't exist in your code at all. Whatever you examined the IL with didn't know you'd used the variable-name name
so it just used V_4
. (What are you using, BTW? I use ILSpy for the most part, and if you'd debug information associated with the file it would have called it name
accordingly).
So, to produce a commented non-shorted version of your method with more comparable names we could write the following CIL:
.method public hidebysig static void Main() cil managed
{
.entrypoint
.maxstack 1
.locals init (int32 unassigned,
int32 i,
float64 d,
float64 PI,
string name)
nop // Do Nothing (helps debugger to have some of these around).
ldc.i4 4 // Push number 4 on stack
stloc i // Pop value from stack, put in i (i = 4)
ldloc i // Push value in i on stack
stloc unassigned // Pop value from stack, put in unassigned (unassigned = i)
ldc.r8 12.34 // Push the 64-bit floating value 12.34 onto the stack
stloc d // Push the value on stack in d (d = 12.34)
ldc.r8 3.1415926535897931 // Push the 64-bit floating value 3.1415926535897931 onto the stack.
stloc PI // Pop the value from stack, put in PI (PI = 3.1415… which is the constant Math.PI)
ldstr "Ehsan" // Push the string "Ehsan" on stack
stloc name // Pop the value from stack, put in name
ret // return.
}
That will behave pretty much as your code does, but be a bit larger. So we replace the stloc
with stloc.0
…stloc.3
where we can, stloc.s
where we can't use those but can still use stloc.s
, and ldc.i4 4
with ldc.i4.4
, and we'll have shorter bytecode that does the same thing:
.method public hidebysig static void Main() cil managed
{
.entrypoint
.maxstack 1
.locals init (int32 unassigned,
int32 i,
float64 d,
float64 PI,
string name)
nop // Do Nothing (helps debugger to have some of these around).
ldc.i4.4 // Push number 4 on stack
stloc.1 // Pop value from stack, put in i (i = 4)
ldloc.1 // Push value in i on stack
stloc.0 // Pop value from stack, put in unassigned (unassigned = i)
ldc.r8 12.34 // Push the 64-bit floating value 12.34 onto the stack
stloc.2 // Push the value on stack in d (d = 12.34)
ldc.r8 3.1415926535897931 // Push the 64-bit floating value 3.1415926535897931 onto the stack.
stloc.3 // Pop the value from stack, put in PI (PI = 3.1415… which is the constant Math.PI)
ldstr "Ehsan" // Push the string "Ehsan" on stack
stloc.s name // Pop the value from stack, put in name
ret // return.
}
And now we've exactly the same code that your disassembly had, except that we've got better names. Remember, the names don't appear in the byte code, so the disassembler couldn't do as good a job as we can.
Your question in a comment should really be another question, but it offers a chance to add something important that I only briefly noted above. Let's consider:
public static void Maybe(int a, int b)
{
if (a > b)
Console.WriteLine("Greater");
Console.WriteLine("Done");
}
Compile in debug and you end up with something like:
.method public hidebysig static
void Maybe (
int32 a,
int32 b
) cil managed
{
.maxstack 2
.locals init (
[0] bool CS$4$0000
)
IL_0000: nop
IL_0001: ldarg.0
IL_0002: ldarg.1
IL_0003: cgt
IL_0005: ldc.i4.0
IL_0006: ceq
IL_0008: stloc.0
IL_0009: ldloc.0
IL_000a: brtrue.s IL_0017
IL_000c: ldstr "Greater"
IL_0011: call void [mscorlib]System.Console::WriteLine(string)
IL_0016: nop
IL_0017: ldstr "Done"
IL_001c: call void [mscorlib]System.Console::WriteLine(string)
IL_0021: nop
IL_0022: ret
}
Now one thing to note is that all of the labels like IL_0017
etc. are added to every line based on the index of the instruction. This makes life easier for the disassembler, but isn't really necessary unless a label is jumped to. Let's strip out all labels that aren't jumped to:
.method public hidebysig static
void Maybe (
int32 a,
int32 b
) cil managed
{
.maxstack 2
.locals init (
[0] bool CS$4$0000
)
nop
ldarg.0
ldarg.1
cgt
ldc.i4.0
ceq
stloc.0
ldloc.0
brtrue.s IL_0017
ldstr "Greater"
call void [mscorlib]System.Console::WriteLine(string)
nop
IL_0017: ldstr "Done"
call void [mscorlib]System.Console::WriteLine(string)
nop
ret
}
Now, let's consider what each line does:
.method public hidebysig static
void Maybe (
int32 a,
int32 b
) cil managed
{
.maxstack 2
.locals init (
[0] bool CS$4$0000
)
nop // Do nothing
ldarg.0 // Load first argument (index 0) onto stack.
ldarg.1 // Load second argument (index 1) onto stack.
cgt // Pop two values from stack, push 1 (true) if the first is greater
// than the second, 0 (false) otherwise.
ldc.i4.0 // Push 0 onto stack.
ceq // Pop two values from stack, push 1 (true) if the two are equal,
// 0 (false) otherwise.
stloc.0 // Pop value from stack, store in first local (index 0)
ldloc.0 // Load first local onto stack.
brtrue.s IL_0017 // Pop value from stack. If it's non-zero (true) jump to IL_0017
ldstr "Greater" // Load string "Greater" onto stack.
// Call Console.WriteLine(string)
call void [mscorlib]System.Console::WriteLine(string)
nop // Do nothing
IL_0017: ldstr "Done" // Load string "Done" onto stack.
// Call Console.WriteLine(string)
call void [mscorlib]System.Console::WriteLine(string)
nop // Do nothing
ret // return
}
Let's write this back into C# in a very literal step-by step way:
public static void Maybe(int a, int b)
{
bool shouldJump = (a > b) == false;
if (shouldJump) goto IL_0017;
Console.WriteLine("Greater");
IL_0017:
Console.WriteLine("Done");
}
Try that and you'll see it does the same thing. The use of goto
is because CIL doesn't really have anything like for
or while
or even blocks we can put after an if
or else
, it just has jumps and conditional jumps.
But why does it bother to store the value (what I called shouldJump
in my C# rewrite) rather than just act on it?
It's just to make it easier to examine what is going on at each point if you are debugging. In particular, for a debugger to be able to stop at the point where a > b
is worked out but not yet acted on then either a > b
or its opposite (a <= b
) needs to be stored.
Debug builds tend to write CIL that spends a lot of time writing a record of what it just did, for that reason. With a release build we'd get something more like:
.method public hidebysig static
void Maybe (
int32 a,
int32 b
) cil managed
{
ldarg.0 // Load first argument onto stack
ldarg.1 // Load second argument onto stack
ble.s IL_000e // Pop two values from stack. If the first is
// less than or equal to the second, goto IL_000e:
ldstr "Greater" // Load string "Greater" onto stack.
// Call Console.WriteLine(string)
call void [mscorlib]System.Console::WriteLine(string)
// Load string "Done" onto stack.
IL_000e: ldstr "Done"
// Call Console.WriteLine(string)
call void [mscorlib]System.Console::WriteLine(string)
ret
}
Or to do a similar line-by-line write back into C#:
public static void Maybe(int a, int b)
{
if (a <= b) goto IL_000e;
Console.WriteLine("Greater");
IL_000e:
Console.WriteLine("Done");
}
So you can see how the release build is more concisely doing the same thing.