Why does C# implement anonymous methods and closures as instance methods, rather than as static methods?

As I'm not exactly an expert on programming languages I'm well aware this may be a stupid question, but as best as I can tell C# handles anonymous methods and closures by making them into instance methods of an anonymous nested class [1], instantiating this class, and then pointing delegates at those instance methods.

It appears that this anonymous class can only ever be instantiated once (or am I wrong about that?), so why not have the anonymous class be static instead?

[1] Actually, it looks like there's one class for closures and one for anonymous methods that don't capture any variables, which I don't entirely understand the rationale for either.

Solution

I'm well aware this may be a stupid question

It's not.

C# handles anonymous methods and closures by making them into instance methods of an anonymous nested class, instantiating this class, and then pointing delegates at those instance methods.

C# does that sometimes.

It appears that this anonymous class can only ever be instantiated once (or am I wrong about that?), so why not have the anonymous class be static instead?

In cases where that would be legal, C# does you one better. It doesn't make a closure class at all. It makes the anonymous function a static function of the current class.

And yes you are wrong about that. In cases where you can get away with only allocating the delegate once, C# does get away with it.

(This is not strictly speaking entirely true; there are some obscure cases where this optimization is not implemented. But for the most part it is.)

Actually, it looks like there's one class for closures and one for anonymous methods that don't capture any variables, which I don't entirely understand the rationale for either.

You have put your finger on the thing you don't adequately understand.

Let's look at some examples:

class C1
{
  Func<int, int, int> M()
  {
    return (x, y) => x + y;
  }
}

This can be generated as

class C1
{
  static Func<int, int, int> theFunction;
  static int Anonymous(int x, int y) { return x + y; }
  Func<int, int, int> M()
  {
    if (C1.theFunction == null) C1.theFunction = C1.Anonymous;
    return C1.theFunction;
  }
}

No new class needed.

Now consider:

class C2
{
  static int counter = 0;
  int x = counter++;
  Func<int, int> M()
  {
    return y => this.x + y;
  }
}

Do you see why this cannot be generated with a static function? The static function would need access to this.x but where is the this in a static function? There isn't one.

So this one has to be an instance function:

class C2
{
  static int counter = 0;
  int x = counter++;
  int Anonymous(int y) { return this.x + y; }
  Func<int, int> M()
  {
    return this.Anonymous;
  }
}

Also, we can no longer cache the delegate in a static field; do you see why?

Exercise: could the delegate be cached in an instance field? If no, then what prevents this from being legal? If yes, what are some arguments against implementing this "optimization"?

Now consider:

class C3
{
  static int counter = 0;
  int x = counter++;
  Func<int> M(int y)
  {
    return () => x + y;
  }
}

This cannot be generated as an instance function of C3; do you see why? We need to be able to say:

var a = new C3();
var b = a.M(123);
var c = b(); // 123 + 0
var d = new C3();
var e = d.M(456);
var f = e(); // 456 + 1
var g = a.M(789);
var h = g(); // 789 + 0

Now the delegates need to know not just the value of this.x but also the value of y that was passed in. That has to be stored somewhere, so we store it in a field. But it can't be a field of C3, because then how do we tell b to use 123 and g to use 789 for the value of y? They have the same instance of C3 but two different values for y.

class C3
{
  class Locals
  {
    public C3 __this;
    public int __y;
    public int Anonymous() { return this.__this.x + this.__y; }
  }
  Func<int> M(int y)
  {
    var locals = new Locals();
    locals.__this = this;
    locals.__y = y;
    return locals.Anonymous;
  }
}

Exercise: Now suppose we have C4<T> with a generic method M<U> where the lambda is closed over variables of types T and U. Describe the codegen that has to happen now.

Exercise: Now suppose we have M return a tuple of delegates, one being ()=>x + y and the other being (int newY)=>{ y = newY; }. Describe the codegen for the two delegates.

Exercise: Now suppose M(int y) returns type Func<int, Func<int, int>> and we return a => b => this.x + y + z + a + b. Describe the codegen.

Exercise: Suppose a lambda closed over both this and a local does a base non-virtual call. It is illegal to do a base call from code inside a type not directly in the type hierarchy of the virtual method, for security reasons. Describe how to generate verifiable code in this case.

Exercise: Put 'em all together. How do you do codegen for multiple nested lambdas with getter and setter lambdas for all locals, parameterized by generic types at the class and method scope, that do base calls? Because that's the problem that we actually had to solve.