Search code examples
c#genericsbytecodesemanticscil

In C# and ECMA-CIL, can a struct-instantiated generic be implemented using boxing?


ECMA-CIL allows generic instances to actually yield a different implementation of the generic definition when instantiated. The instantiation can be specialized based on the chosen generic arguments.

Is there any case where a generic may behave differently if instantiated by a struct instead of an object reference? This is a question regarding semantics; I am not talking about performance.

In other words, could a naive implementation of ECMA-CIL decide to implement struct-instantiated generics as boxed values (as in Java)?

I read ECMA-CIL, but I'm still not sure about this. Any feedback is more than appreciated. Although I'm particularly interested in what happens at the bytecode level, an answer from the C# language perspective is also valuable.


Solution

  • Here's a simulated boxed value type in C#:

    public class Boxed<T> where T : struct
    {
        public T Value; // do not assign to!
    
        public Boxed(T value) => Value = value;
    }
    

    This is the best we can do at .NET level, since there is no native way to make a boxed value type reference (C++/CLI uses a tagged object to specify that). This is also more or less equivalent to System.Runtime.CompilerServices.StrongBox<T>.

    Is there any case where a generic may behave differently if instantiated by a struct instead of an object reference? This is a question regarding semantics; I am not talking about performance.

    Of course, using Boxed<T> means that default(Boxed<T>) is null for example. Here is a situation where this could be an issue:

    public static class GlobalVariable<T>
    {
        public T Value;
    }
    

    If .NET actually implemented GlobalVariable<int> using GlobalVariable<Boxed<int>>, the Value field would contain null instead of 0. A conforming CLI implementation would have to implicitly use Value = new Boxed<T>(default(T)); when the type is instantiated, or modify ldfld (or ldflda) to create such an instance there if there is null (wreaking havoc for threading and readonly).

    Another issue is copy semantics of value types. Each opcode like ldloc or similar assumes that a value type would be copied and mutating the value of the target should not affect the source. For example:

    
    public static class GlobalVariable<T>
    {
        public T Value;
    
        public delegate void Mutator(ref T value);
    
        public static T MutateCopy(Mutator mutator)
        {
            var copy = Value;
            mutator(ref copy);
            return copy;
        }
    }
    

    Calling something like GlobalVariable<Boxed<ValueTuple<int>>>.MutateCopy would pose an issue (if the runtime actually translated mutator(ref copy) to something like mutator(ref copy.Value) in order to call the delegate) since it would access the same instance. The runtime would have to "clone" Value as an object during the assignment in order to fix it.


    Summed up, a value type type argument *could* be implemented using boxing, but without any additional special treatment, you get the same issues you would get in Java when using wrapper classes. It could work without mutable value types or strict null conversion checks (and potentially without byref parameters), but anything more than that will require additional (and complicated) changes to the implementation or languages. For an example of such a thing, see "unboxed" reference types in C++/CLI (i.e. reference types without the hat ^, which have value semantics).