Search code examples
c#performanceoperator-overloadingoperatorscommutativity

How should commutative operator overloads be efficiently implemented in C#?


Say I have a type Vector3 with an overloaded operator * allowing multiplication by a double:

public readonly struct Vector3
{
    public double X { get; }
    public double Y { get; }
    public double Z { get; }

    public Vector3f(double x, double y, double z)
    {
        X = x;
        Y = y;
        Z = z;
    }

    public static Vector3f operator *(in Vector3f v, in double d) => new Vector3f(d * v.X, d * v.Y, d * v.Z);
}

With only the one overload, expressions akin to new Vector3(1,2,3) * 1.5 will compile but 1.5 * new Vector3(1,2,3) will not. Since vector-scalar multiplication is commutative I would like either order to work, so I add another overload with the parameters reversed that will just call the original overload:

public static Vector3f operator *(in double d, in Vector3f v) => v * d;

Is that the correct way to do things? Should the second overload be implemented as

public static Vector3f operator *(in double d, in Vector3f v) => new Vector3f(d * v.X, d * v.Y, d * v.Z);

instead? Naively I'd expect the compiler to optimise the "extra" call away and always use the first overload if possible (or maybe replace the body of the short overload with that of the long one), but I don't know the behaviour of the C# compiler well enough to say either way.

I realise that in many cases this is the sort of performance quibble that is dwarfed by algorithm choice, but in some cases squeezing every last drop of performance is critical. In performance-critical cases, should commutative operator overloads be implemented as two overloads that are identical except for the order of the parameters, or is it just as efficient to have one delegate to the other?


Solution

  • Here you can see the difference between the two approaches.

    Please remember that this is IL and not the final assembly code generated after JIT optimizations.

    1. "implemented as two overloads that are identical except for the order of the parameters"

    The generated IL in this case is below.

    .method public hidebysig specialname static 
            valuetype lib.Vector3f  op_Multiply([in] float64& d,
                                                [in] valuetype lib.Vector3f& v) cil managed
    {
      .param [1]
      .custom instance void System.Runtime.CompilerServices.IsReadOnlyAttribute::.ctor() = ( 01 00 00 00 ) 
      .param [2]
      .custom instance void System.Runtime.CompilerServices.IsReadOnlyAttribute::.ctor() = ( 01 00 00 00 ) 
      // Code size       33 (0x21)
      .maxstack  8
      IL_0000:  ldarg.0
      IL_0001:  ldind.r8
      IL_0002:  ldarg.1
      IL_0003:  call       instance float64 lib.Vector3f::get_X()
      IL_0008:  mul
      IL_0009:  ldarg.0
      IL_000a:  ldind.r8
      IL_000b:  ldarg.1
      IL_000c:  call       instance float64 lib.Vector3f::get_Y()
      IL_0011:  mul
      IL_0012:  ldarg.0
      IL_0013:  ldind.r8
      IL_0014:  ldarg.1
      IL_0015:  call       instance float64 lib.Vector3f::get_Z()
      IL_001a:  mul
      IL_001b:  newobj     instance void lib.Vector3f::.ctor(float64,
                                                             float64,
                                                             float64)
      IL_0020:  ret
    } // end of method Vector3f::op_Multiply
    
    1. "or is it just as efficient to have one delegate to the other?":

    So here you can see the overhead of calling the *(v,d) operator from inside the *(d,v) operator

    .method public hidebysig specialname static 
            valuetype lib.Vector3f  op_Multiply([in] float64& d,
                                                [in] valuetype lib.Vector3f& v) cil managed
    {
      .param [1]
      .custom instance void System.Runtime.CompilerServices.IsReadOnlyAttribute::.ctor() = ( 01 00 00 00 ) 
      .param [2]
      .custom instance void System.Runtime.CompilerServices.IsReadOnlyAttribute::.ctor() = ( 01 00 00 00 ) 
      // Code size       8 (0x8)
      .maxstack  8
      IL_0000:  ldarg.1
      IL_0001:  ldarg.0
      IL_0002:  call       valuetype lib.Vector3f lib.Vector3f::op_Multiply(valuetype lib.Vector3f&,
                                                                            float64&)
      IL_0007:  ret
    } // end of method Vector3f::op_Multiply
    

    There is, of course, an increase in the total number of IL operations executed, and if this is what you want to avoid, you should have the same code executed in both of your operators.

    You can also try having a Multiply(Vector3f v, double d) method, decorate it with [MethodImpl(MethodImplOptions.AggressiveInlining)] and call this method from both operators, -- and hope for the best. It will not be in the IL, but JIT will probably inline the Multiply() code.

    Maybe masters will have more to say on this.