Search code examples
c#performancelambda

Obscure compiler's lambda expression translation


I studied Y Combinator (using C# 5.0) and was quite surprised when this method:

public static  Func<T1, Func<T2, TOut>> Curry<T1, T2, TOut> ( this Func<T1, T2, TOut> f)
{
    return a => b => f(a, b);
}

... was translated by the compiler to this:

public static Func<T1, Func<T2, TOut>> Curry<T1, T2, TOut>(this Func<T1, T2, TOut> f)
{
    first<T1, T2, TOut> local = new first<T1, T2, TOut>();
    local.function = f;
    return new Func<T1, Func<T2, TOut>>(local.Curry);
}

private sealed class first<T1, T2, TOut>
{
    private sealed class second
    {
        public first<T1, T2, TOut> ancestor;
        public T1 firstParameter;
        public TOut Curry(T2 secondParameter)
        {
            return ancestor.function(firstParameter, secondParameter);
        }
    }
    public Func<T1, T2, TOut> function;
    public Func<T2, TOut> Curry(T1 firstParameter)
    {
        second local = new second();
        local.ancestor = this;
        local.firstParameter = firstParameter;
        return new Func<T2, TOut>(local.Curry);
    }
}

So, second class is nested and first class isn't available for garbage collection while we use delegate that references to second.Curry. At the same time all that we need in first class is function. May be we can copy it (delegate) to the second class and then the first class could be collected? Yes, we also should then do second class non-nested but it seems that is ok. As I know delegates are copied "by value" so I can suggest that it is quite slow, but at the same time we copy firstParameter?! So, may be anyone could explain, why the compiler do all this things?) I speak about anything like this:

private sealed class first<T1, T2, TOut>
{
    public Func<T1, T2, TOut> function;
    public Func<T2, TOut> Curry(T1 firstParameter)
    {
        second<T1, T2, TOut> local = new second<T1, T2, TOut>();
        local.function = function;
        local.firstParameter = firstParameter;
        return new Func<T2, TOut>(local.Curry);
    }
}

public sealed class second<T1, T2, TOut>
{
    public T1 firstParameter;
    public Func<T1, T2, TOut> function;
    public TOut Curry(T2 secondParameter)
    {
        return function(firstParameter, secondParameter);
    }
}

Solution

  • The question is difficult to understand. Let me clarify it. Your suggestion is that the compiler could instead generate

    public static Func<T1, Func<T2, TOut>> Curry<T1, T2, TOut>(this Func<T1, T2, TOut> f)
    {
        first<T1, T2, TOut> local = new first<T1, T2, TOut>();
        local.function = f;
        return new Func<T1, Func<T2, TOut>>(local.Curry);
    }
    private sealed class first<T1, T2, TOut>
    {
        private sealed class second
        {
            //public first<T1, T2, TOut> ancestor;
            public Func<T1, T2, TOut> function;
            public T1 firstParameter;
            public TOut Curry(T2 secondParameter)
            {
                return /*ancestor.*/function(firstParameter, secondParameter);
            }
        }
        // public Func<T1, T2, TOut> function;
        public Func<T2, TOut> Curry(T1 firstParameter)
        {
            second local = new second();
            // local.ancestor = this;
            local.function = function;
            local.firstParameter = firstParameter;
            return new Func<T2, TOut>(local.Curry);
        }
    }
    

    Yes?

    Your claim is that this is an improvement because of this scenario.

    Func<int, int, int> adder = (x, y)=>x+y;
    Func<int, Func<int, int>> makeAdder = adder.Curry();
    Func<int, int> addFive = makeAdder(5);
    
    • addFive is a delegate of a method on an instance of second
    • makeAdder is a delegate of a method on an instance of first
    • In the original codegen, second holds on to ancestor, which is that same instance of first

    Therefore if we say

    makeAdder = null;
    

    then the instance of first cannot be collected. The instance is not reachable via makeAdder anymore, but it is reachable via addFive.

    In the proposed codegen, first can be collected in this scenario because the instance is not reachable via addFive.

    You are correct that in this particular scenario that optimization would be legal. However it would not in general be legal for the reason that Ben Voigt describes in his answer. If f is mutated inside Curry then local.function has to mutate. But local has no access to an instance of second until the outer delegate is executed.

    The C# compiler team could choose to do the optimization you've identified, but the tiny savings has thus far simply not worth the bother.

    We were considering for Roslyn making an optimization along the lines that you describe; that is, if the outer variable is known not to mutate then capture its value more aggressively. I do not know whether that optimization made it in to Roslyn or not.