Search code examples
c#compiler-constructioncovariancecontravariance

How can compiler compute automatically co- and contravariance?


Please note this is a question about internals of compilers.

I just read [1] that when introducing variance for generic types C# team was thinking whether they should automatically compute if the type is co- or contravariant. Of course this is a history now, but nevertheless I am wondering how could this be done?

Is taking all methods (excluding constructors) and checking if the type is in in or out position enough?

[1] Jeffrey Richter, CLR via C#, 4th edition, p.281.


Solution

  • The link in the now-deleted answer is to my article that explains the exact rules for determining variance validity, which does not answer your question. The link you're actually looking for is to my article on why the C# compiler team rejected attempting to compute variance without any syntax, which is here:

    http://blogs.msdn.com/b/ericlippert/archive/2007/10/29/covariance-and-contravariance-in-c-part-seven-why-do-we-need-a-syntax-at-all.aspx

    Briefly, the reasons for rejecting such a feature are:

    • The feature requires whole-program analysis. Not only is this expensive, it means that small changes in one type can cause the variance choices of many far-away types to change unexpectedly.
    • Variance is something that you want to design in to a type; it is a statement of how you expect the type to be used by its users not just today, but forever. That expectation should be encoded into the program text.
    • There are plenty of cases where it is very difficult to compute the intention of the user, and then what do you do? You have to resolve it by requiring a syntax, and so why not just require it all the time? For example:

    interface I<V, W> 
    { 
         I<V, W> M(I<W, V> x);
    }
    

    As an exercise, compute what all the possible valid variance annotations are on V and W. Now, how should the compiler do the same computation? What algorithm did you use? And second, given that this is ambiguous, how would you choose to resolve the ambiguity?

    Now, I note that this answer thus far also does not answer your question. You asked how it could be done, and all I gave you was reasons why we shouldn't make the attempt to do it. There are many ways it could be done.

    For example, take every generic type in the program, and every type parameter of those generic types. Suppose there are a hundred of them. Then there are only three-to-the-hundred possible combinations of invariant, in and out for each; try all of them, see which ones work, and then have a ranking function that chooses from the winners. The problem there of course is that it takes longer than the age of the universe to run.

    Now, we could apply a smart pruning algorithm to say "any choice where T is in and is also known to be used in an output position is invalid", so don't check any of those cases. Now we have a situation where we have hundreds of such predicates that must all be applied in order to determine what the applicable set of variance validities are. As I noted in my example above, it can be quite tricky to figure out when something is actually in an input or output position. So this is probably a non-starter as well.

    Ah, but that idea implies that analysis of predicates about an algebraic system is a potentially good technique. We could build an engine that generates predicates and then applies a sophisticated SMT solver to it. That would have bad cases that require gazillions of computations, but modern SMT solvers are pretty good in typical cases.

    But all this is way, way too much work to do for a feature that has almost no value to the user.