Search code examples
c#ilcscildasm

Different IL generated when adding one more int variable


I have this program in c#:

using System;

class Program
{
    public static void Main()
    {
    int i = 4;
    double d = 12.34;
    double PI = Math.PI;
    string name = "Ehsan";


    }
}

and when i compile it, following is the IL generated by compiler for Main:

.method public hidebysig static void  Main() cil managed
{
  .entrypoint
  // Code size       30 (0x1e)
  .maxstack  1
  .locals init (int32 V_0,
           float64 V_1,
           float64 V_2,
           string V_3)
  IL_0000:  nop
  IL_0001:  ldc.i4.4
  IL_0002:  stloc.0
  IL_0003:  ldc.r8     12.34
  IL_000c:  stloc.1
  IL_000d:  ldc.r8     3.1415926535897931
  IL_0016:  stloc.2
  IL_0017:  ldstr      "Ehsan"
  IL_001c:  stloc.3
  IL_001d:  ret
} // end of method Program::Main

which is fine and i understand it, now if i add another integer variable then something different is generated, here is the modified c# code:

using System;

class Program
{
    public static void Main()
    {
    int unassigned;
    int i = 4;
    unassigned = i;
    double d = 12.34;
        double PI = Math.PI;
    string name = "Ehsan";


    }
}

and here is the IL generated against the above c# code:

.method public hidebysig static void  Main() cil managed
{
  .entrypoint
  // Code size       33 (0x21)
  .maxstack  1
  .locals init (int32 V_0,
           int32 V_1,
           float64 V_2,
           float64 V_3,
           string V_4)
  IL_0000:  nop
  IL_0001:  ldc.i4.4
  IL_0002:  stloc.1
  IL_0003:  ldloc.1
  IL_0004:  stloc.0
  IL_0005:  ldc.r8     12.34
  IL_000e:  stloc.2
  IL_000f:  ldc.r8     3.1415926535897931
  IL_0018:  stloc.3
  IL_0019:  ldstr      "Ehsan"
  IL_001e:  stloc.s    V_4  // what is happening here in this case
  IL_0020:  ret
} // end of method Program::Main

If you note now the stloc.s statement is generated with V_4 which is local but i am not clear about this and i am also not getting what is the purpose of these locals here, i mean these:

 .locals init (int32 V_0,
               float64 V_1,
               float64 V_2,
               string V_3)

Solution

  • Some things to note.

    First, this is presumably a debug build, or at least has certain optimisations turned off in the compilation. What I would expect to see here is:

    .method public hidebysig static void Main () cil managed 
    {
      .entrypoint
    
      IL_0000: ret
    }
    

    Which is to say, since those locals aren't used, I'd expect the compiler to just skip them entirely. It won't on a debug build, but this stands as a good example of how there can be considerable difference between what the C# says and what the IL says.

    The next thing to note is how an IL method is structured. You have an array of local values, which is defined with the .locals block, of various types. These will generally correspond pretty closely to what the C# had, though there'll often be short-cuts and re-arrangements made.

    Finally we have the set of instructions which all act upon those locals, any arguments, and a stack to which it can push, from which it can pop, and upon which various instructions will interact.

    The next thing to note is that the IL you see here is a sort of assembly for byte-code: Every instruction here has a one-to-one mapping to one or two bytes, and every value also consumes a certain number of bytes. So for example, stloc V_4 (not actually present in your examples, but we'll come to that) would map to 0xFE 0x0E 0x04 0x00 where 0xFE 0x0E is the encoding of stloc and 0x04 0x00 that of 4 which is the index of the local in question. It means "pop the value of the top of the stack, and store it in the 5th (index 4) local".

    Now, there are a few abbreviations here. One of these is the .s "short" form of several instructions (_S in the name of the equivalent System.Reflection.Emit.OpCode value). These are variants of other instructions that take a one-byte value (signed or unsigned depending on the instruction) where the other form takes a two- or four-byte value, generally indices or relative distances to jump. So instead of stloc V_4 we can have stloc.s V_4 which is only 0x13 0x4, and so is smaller.

    Then there are some variants that include a particular value in the instruction. So instead of either stloc V_0 or stloc.s V_0 we can just use stloc.0 which is just the single byte 0x0A.

    This makes a lot of sense when you consider that it's common to only have a handful of locals in use at a time, so using either stloc.s or (better yet) the likes of stloc.0, stloc.1, etc.) gives tiny savings that add up to quite a lot.

    But only so much. If we had e.g stloc.252, stloc.253 etc. then there'd be a lot of such instructions, and the number of bytes needed for each instruction would have to be more, and it would overall be a loss. The super-short forms of the local-related (stloc, ldloc) and argument-related (ldarg) only go up to 3. (There is a starg and starg.s but no starg.0 etc. as storing to arguments is relatively rare). ldc.i4/ldc.i4.s (push a constant 32-bit signed value onto the stack) has super-short versions going from ldc.i4.0 to ldc.i4.8 and also lcd.i4.m1 for -1.

    It's also worth noting that the V_4 doesn't exist in your code at all. Whatever you examined the IL with didn't know you'd used the variable-name name so it just used V_4. (What are you using, BTW? I use ILSpy for the most part, and if you'd debug information associated with the file it would have called it name accordingly).

    So, to produce a commented non-shorted version of your method with more comparable names we could write the following CIL:

    .method public hidebysig static void  Main() cil managed
    {
      .entrypoint
      .maxstack  1
      .locals init (int32 unassigned,
               int32 i,
               float64 d,
               float64 PI,
               string name)
      nop                           // Do Nothing (helps debugger to have some of these around).
      ldc.i4   4                    // Push number 4 on stack
      stloc    i                    // Pop value from stack, put in i (i = 4)
      ldloc    i                    // Push value in i on stack
      stloc    unassigned           // Pop value from stack, put in unassigned (unassigned = i)
      ldc.r8   12.34                // Push the 64-bit floating value 12.34 onto the stack
      stloc    d                    // Push the value on stack in d (d = 12.34)
      ldc.r8   3.1415926535897931   // Push the 64-bit floating value 3.1415926535897931 onto the stack.
      stloc PI                      // Pop the value from stack, put in PI (PI = 3.1415… which is the constant Math.PI)
      ldstr    "Ehsan"              // Push the string "Ehsan" on stack
      stloc    name                 // Pop the value from stack, put in name
      ret                           // return.
    }
    

    That will behave pretty much as your code does, but be a bit larger. So we replace the stloc with stloc.0stloc.3 where we can, stloc.s where we can't use those but can still use stloc.s, and ldc.i4 4 with ldc.i4.4, and we'll have shorter bytecode that does the same thing:

    .method public hidebysig static void  Main() cil managed
    {
      .entrypoint
      .maxstack  1
      .locals init (int32 unassigned,
               int32 i,
               float64 d,
               float64 PI,
               string name)
      nop                           // Do Nothing (helps debugger to have some of these around).
      ldc.i4.4                      // Push number 4 on stack
      stloc.1                       // Pop value from stack, put in i (i = 4)
      ldloc.1                       // Push value in i on stack
      stloc.0                       // Pop value from stack, put in unassigned (unassigned = i)
      ldc.r8   12.34                // Push the 64-bit floating value 12.34 onto the stack
      stloc.2                       // Push the value on stack in d (d = 12.34)
      ldc.r8   3.1415926535897931   // Push the 64-bit floating value 3.1415926535897931 onto the stack.
      stloc.3                       // Pop the value from stack, put in PI (PI = 3.1415… which is the constant Math.PI)
      ldstr    "Ehsan"              // Push the string "Ehsan" on stack
      stloc.s  name                 // Pop the value from stack, put in name
      ret                           // return.
    }
    

    And now we've exactly the same code that your disassembly had, except that we've got better names. Remember, the names don't appear in the byte code, so the disassembler couldn't do as good a job as we can.


    Your question in a comment should really be another question, but it offers a chance to add something important that I only briefly noted above. Let's consider:

    public static void Maybe(int a, int b)
    {
      if (a > b)
        Console.WriteLine("Greater");
      Console.WriteLine("Done");
    }
    

    Compile in debug and you end up with something like:

    .method public hidebysig static 
      void Maybe (
        int32 a,
        int32 b
      ) cil managed 
    {
      .maxstack 2
      .locals init (
        [0] bool CS$4$0000
      )
    
      IL_0000: nop
      IL_0001: ldarg.0
      IL_0002: ldarg.1
      IL_0003: cgt
      IL_0005: ldc.i4.0
      IL_0006: ceq
      IL_0008: stloc.0
      IL_0009: ldloc.0
      IL_000a: brtrue.s IL_0017
    
      IL_000c: ldstr "Greater"
      IL_0011: call void [mscorlib]System.Console::WriteLine(string)
      IL_0016: nop
    
      IL_0017: ldstr "Done"
      IL_001c: call void [mscorlib]System.Console::WriteLine(string)
      IL_0021: nop
      IL_0022: ret
    }
    

    Now one thing to note is that all of the labels like IL_0017 etc. are added to every line based on the index of the instruction. This makes life easier for the disassembler, but isn't really necessary unless a label is jumped to. Let's strip out all labels that aren't jumped to:

    .method public hidebysig static 
      void Maybe (
        int32 a,
        int32 b
      ) cil managed 
    {
      .maxstack 2
      .locals init (
        [0] bool CS$4$0000
      )
    
      nop
      ldarg.0
      ldarg.1
      cgt
      ldc.i4.0
      ceq
      stloc.0
      ldloc.0
      brtrue.s IL_0017
    
      ldstr "Greater"
      call void [mscorlib]System.Console::WriteLine(string)
      nop
    
      IL_0017: ldstr "Done"
      call void [mscorlib]System.Console::WriteLine(string)
      nop
      ret
    }
    

    Now, let's consider what each line does:

    .method public hidebysig static 
      void Maybe (
        int32 a,
        int32 b
      ) cil managed 
    {
      .maxstack 2
      .locals init (
        [0] bool CS$4$0000
      )
    
      nop                   // Do nothing
      ldarg.0               // Load first argument (index 0) onto stack.
      ldarg.1               // Load second argument (index 1) onto stack.
      cgt                   // Pop two values from stack, push 1 (true) if the first is greater
                            // than the second, 0 (false) otherwise.
      ldc.i4.0              // Push 0 onto stack.
      ceq                   // Pop two values from stack, push 1 (true) if the two are equal,
                            // 0 (false) otherwise.
      stloc.0               // Pop value from stack, store in first local (index 0)
      ldloc.0               // Load first local onto stack.
      brtrue.s IL_0017      // Pop value from stack. If it's non-zero (true) jump to IL_0017
    
      ldstr "Greater"       // Load string "Greater" onto stack.
    
                            // Call Console.WriteLine(string)
      call void [mscorlib]System.Console::WriteLine(string)
      nop                   // Do nothing
    
      IL_0017: ldstr "Done" // Load string "Done" onto stack.
                            // Call Console.WriteLine(string)
      call void [mscorlib]System.Console::WriteLine(string)
      nop                   // Do nothing
      ret                   // return
    }
    

    Let's write this back into C# in a very literal step-by step way:

    public static void Maybe(int a, int b)
    {
      bool shouldJump = (a > b) == false;
      if (shouldJump) goto IL_0017;
      Console.WriteLine("Greater");
    IL_0017:
      Console.WriteLine("Done");
    }
    

    Try that and you'll see it does the same thing. The use of goto is because CIL doesn't really have anything like for or while or even blocks we can put after an if or else, it just has jumps and conditional jumps.

    But why does it bother to store the value (what I called shouldJump in my C# rewrite) rather than just act on it?

    It's just to make it easier to examine what is going on at each point if you are debugging. In particular, for a debugger to be able to stop at the point where a > b is worked out but not yet acted on then either a > b or its opposite (a <= b) needs to be stored.

    Debug builds tend to write CIL that spends a lot of time writing a record of what it just did, for that reason. With a release build we'd get something more like:

    .method public hidebysig static 
      void Maybe (
        int32 a,
        int32 b
      ) cil managed 
    {
      ldarg.0           // Load first argument onto stack
      ldarg.1           // Load second argument onto stack
      ble.s IL_000e     // Pop two values from stack. If the first is
                        // less than or equal to the second, goto IL_000e: 
      ldstr "Greater"   // Load string "Greater" onto stack.
                        // Call Console.WriteLine(string)
      call void [mscorlib]System.Console::WriteLine(string)
                        // Load string "Done" onto stack.
      IL_000e: ldstr "Done"
                        // Call Console.WriteLine(string)
      call void [mscorlib]System.Console::WriteLine(string)
      ret
    }
    

    Or to do a similar line-by-line write back into C#:

    public static void Maybe(int a, int b)
    {
      if (a <= b) goto IL_000e;
      Console.WriteLine("Greater");
    IL_000e:
      Console.WriteLine("Done");
    }
    

    So you can see how the release build is more concisely doing the same thing.