Search code examples
c#cil

How to implement C# foreach optimization in IL


In this answer and this GitHub issue (top item) there is a description of foreach optimization used by C# compiler.

Basically, instead of allocating IEnumerable<T>, the generated code calls GetEnumerator() and then MoveNext() on the returned object, always using a direct call and therefore avoiding boxing and virtual calls.

Is it possible to write the same logic in intermediary language? I am quite a beginner with IL, however am familiar with the Unsafe package and the way it works. I wonder if it is possible to write an unsafe method in IL that accepts some object and calls it methods and properties directly?

(Also, could someone kindly give a link to the line in the Roslyn repo where this foreach optimization happens? The repo is so big and complex, I am lost there so far.)

Update:

Here is a method template

[MethodImpl(MethodImplOptions.AggressiveInlining)]
[ILSub(@"
    .. IL code here to be replaced by ilasm.exe
    .. Is there a way to do the same without boxing and virtual calls?
    ")]
public T CallIEnumerableMoveNextViaIL<T>(IEnumerable<T> enumerable)
{
    // I know that the `enumerable` returns an enumerator that is a struct, but its type could be custom
    // Next two calls are virtual via an interface, and enumerator is boxed
    var enumerator = enumerable.GetEnumerator();
    enumerator.MoveNext();
    return enumerator.Current;
}

And here IL for that method:

IL_0000: ldarg.1
IL_0001: callvirt instance class [mscorlib]System.Collections.Generic.IEnumerator`1<!0> class [mscorlib]System.Collections.Generic.IEnumerable`1<!!T>::GetEnumerator()
IL_0006: dup
IL_0007: callvirt instance bool [mscorlib]System.Collections.IEnumerator::MoveNext()
IL_000c: pop
IL_000d: callvirt instance !0 class [mscorlib]System.Collections.Generic.IEnumerator`1<!!T>::get_Current()
IL_0012: ret

From the comments it looks like it is impossible to call the methods such as get_Current() without knowing the type.


Solution

  • Let's have a couple of examples of how foreach gets compiled.
    First, on a regular IEnumerator-returning GetEnumerator:

    public class A : IEnumerable
    {
        public IEnumerator GetEnumerator()
        {
            throw new NotImplementedException();
        }
    }
    
    foreach(object o in new A())
    {
    
    }
    

      .locals init (class [mscorlib]System.Collections.IEnumerator V_0,
               class [mscorlib]System.IDisposable V_1)
      IL_0000:  newobj     instance void Testing.Program/A::.ctor()
      IL_0005:  call       instance class [mscorlib]System.Collections.IEnumerator Testing.Program/A::GetEnumerator()
      IL_000a:  stloc.0
      .try
      {
        IL_000b:  br.s       IL_0014
        IL_000d:  ldloc.0
        IL_000e:  callvirt   instance object [mscorlib]System.Collections.IEnumerator::get_Current()
        IL_0013:  pop
        IL_0014:  ldloc.0
        IL_0015:  callvirt   instance bool [mscorlib]System.Collections.IEnumerator::MoveNext()
        IL_001a:  brtrue.s   IL_000d
        IL_001c:  leave.s    IL_002f
      }  // end .try
      finally
      {
        IL_001e:  ldloc.0
        IL_001f:  isinst     [mscorlib]System.IDisposable
        IL_0024:  stloc.1
        IL_0025:  ldloc.1
        IL_0026:  brfalse.s  IL_002e
        IL_0028:  ldloc.1
        IL_0029:  callvirt   instance void [mscorlib]System.IDisposable::Dispose()
        IL_002e:  endfinally
      }  // end handler
    

    Nothing wondrous here, just calling the methods on IEnumerator. Notice that it also implements IDisposable, hence it uses its pattern.

    Next, let's have a value type-returning GetEnumerator:

    public class B
    {
        public BE GetEnumerator()
        {
            return new BE();
        }
    
        public struct BE
        {
            public object Current {
                get {
                    throw new NotImplementedException();
                }
            }
    
            public bool MoveNext()
            {
                throw new NotImplementedException();
            }
    
            public void Reset()
            {
                throw new NotImplementedException();
            }
        }
    }
    

      .locals init (class [mscorlib]System.Collections.IEnumerator V_0,
               class [mscorlib]System.IDisposable V_1,
               valuetype Testing.Program/B/BE V_2,
               object[] V_3,
               int32 V_4)
      IL_002f:  newobj     instance void Testing.Program/B::.ctor()
      IL_0034:  call       instance valuetype Testing.Program/B/BE Testing.Program/B::GetEnumerator()
      IL_0039:  stloc.2
      IL_003a:  br.s       IL_0044
      IL_003c:  ldloca.s   V_2
      IL_003e:  call       instance object Testing.Program/B/BE::get_Current()
      IL_0043:  pop
      IL_0044:  ldloca.s   V_2
      IL_0046:  call       instance bool Testing.Program/B/BE::MoveNext()
      IL_004b:  brtrue.s   IL_003c
    

    Notice that nothing here has to implement IEnumerable/IEnumerator interfaces. This method is sometimes called duck-typing. This implementation internally stores the enumerator in a variable and then calls the methods on its address (ldloca). One value type copying happens here, when the enumerator is returned from GetEnumerator.

    A third example is almost a completely different thing, foreach on an array:

    foreach(object o in new object[0])
    {
    
    }
    
      .locals init (class [mscorlib]System.Collections.IEnumerator V_0,
               class [mscorlib]System.IDisposable V_1,
               valuetype Testing.Program/B/BE V_2,
               object[] V_3,
               int32 V_4)  IL_004d:  ldc.i4.0
      IL_004e:  newarr     [mscorlib]System.Object
      IL_0053:  stloc.3
      IL_0054:  ldc.i4.0
      IL_0055:  stloc.s    V_4
      IL_0057:  br.s       IL_0064
      IL_0059:  ldloc.3
      IL_005a:  ldloc.s    V_4
      IL_005c:  ldelem.ref
      IL_005d:  pop
      IL_005e:  ldloc.s    V_4
      IL_0060:  ldc.i4.1
      IL_0061:  add
      IL_0062:  stloc.s    V_4
      IL_0064:  ldloc.s    V_4
      IL_0066:  ldloc.3
      IL_0067:  ldlen
      IL_0068:  conv.i4
      IL_0069:  blt.s      IL_0059
    

    This uses no GetEnumerator, and simply traverses the array in the old-fashioned index way (higher performance).

    You cannot use this exact optimization in CIL, because CIL has no duck-typing; you have to write all the signatures and method calls yourself.

    However, if you need to have this optimization for any type, and you can modify the types you want to use, you can use generic interfaces in a code similar to this:

    public class B : IStructEnumerable<object, BE>
    {
        public BE GetEnumerator()
        {
            return new BE();
        }
    }
    
    public struct BE : IStructEnumerator<object>
    {
        public object Current {
            get {
                throw new NotImplementedException();
            }
        }
    
        public bool MoveNext()
        {
            throw new NotImplementedException();
        }
    
        public void Reset()
        {
            throw new NotImplementedException();
        }
    }
    
    public interface IStructEnumerable<TItem, TEnumerator> where TEnumerator : struct, IStructEnumerator<TItem>
    {
        TEnumerator GetEnumerator();
    }
    
    public interface IStructEnumerator<TItem>
    {
        TItem Current {get;}
        bool MoveNext();
        void Reset();
    }
    
    public static void TestEnumerator<TEnumerable, TEnumerator>(TEnumerable b) where TEnumerable : IStructEnumerable<object, TEnumerator> where TEnumerator : struct, IStructEnumerator<object>
    {
        foreach(object obj in b)
        {
    
        }
    }