Say I have a type Vector3
with an overloaded operator * allowing multiplication by a double:
public readonly struct Vector3
{
public double X { get; }
public double Y { get; }
public double Z { get; }
public Vector3f(double x, double y, double z)
{
X = x;
Y = y;
Z = z;
}
public static Vector3f operator *(in Vector3f v, in double d) => new Vector3f(d * v.X, d * v.Y, d * v.Z);
}
With only the one overload, expressions akin to new Vector3(1,2,3) * 1.5
will compile but 1.5 * new Vector3(1,2,3)
will not. Since vector-scalar multiplication is commutative I would like either order to work, so I add another overload with the parameters reversed that will just call the original overload:
public static Vector3f operator *(in double d, in Vector3f v) => v * d;
Is that the correct way to do things? Should the second overload be implemented as
public static Vector3f operator *(in double d, in Vector3f v) => new Vector3f(d * v.X, d * v.Y, d * v.Z);
instead? Naively I'd expect the compiler to optimise the "extra" call away and always use the first overload if possible (or maybe replace the body of the short overload with that of the long one), but I don't know the behaviour of the C# compiler well enough to say either way.
I realise that in many cases this is the sort of performance quibble that is dwarfed by algorithm choice, but in some cases squeezing every last drop of performance is critical. In performance-critical cases, should commutative operator overloads be implemented as two overloads that are identical except for the order of the parameters, or is it just as efficient to have one delegate to the other?
Here you can see the difference between the two approaches.
Please remember that this is IL and not the final assembly code generated after JIT optimizations.
The generated IL in this case is below.
.method public hidebysig specialname static
valuetype lib.Vector3f op_Multiply([in] float64& d,
[in] valuetype lib.Vector3f& v) cil managed
{
.param [1]
.custom instance void System.Runtime.CompilerServices.IsReadOnlyAttribute::.ctor() = ( 01 00 00 00 )
.param [2]
.custom instance void System.Runtime.CompilerServices.IsReadOnlyAttribute::.ctor() = ( 01 00 00 00 )
// Code size 33 (0x21)
.maxstack 8
IL_0000: ldarg.0
IL_0001: ldind.r8
IL_0002: ldarg.1
IL_0003: call instance float64 lib.Vector3f::get_X()
IL_0008: mul
IL_0009: ldarg.0
IL_000a: ldind.r8
IL_000b: ldarg.1
IL_000c: call instance float64 lib.Vector3f::get_Y()
IL_0011: mul
IL_0012: ldarg.0
IL_0013: ldind.r8
IL_0014: ldarg.1
IL_0015: call instance float64 lib.Vector3f::get_Z()
IL_001a: mul
IL_001b: newobj instance void lib.Vector3f::.ctor(float64,
float64,
float64)
IL_0020: ret
} // end of method Vector3f::op_Multiply
So here you can see the overhead of calling the *(v,d)
operator from inside the *(d,v)
operator
.method public hidebysig specialname static
valuetype lib.Vector3f op_Multiply([in] float64& d,
[in] valuetype lib.Vector3f& v) cil managed
{
.param [1]
.custom instance void System.Runtime.CompilerServices.IsReadOnlyAttribute::.ctor() = ( 01 00 00 00 )
.param [2]
.custom instance void System.Runtime.CompilerServices.IsReadOnlyAttribute::.ctor() = ( 01 00 00 00 )
// Code size 8 (0x8)
.maxstack 8
IL_0000: ldarg.1
IL_0001: ldarg.0
IL_0002: call valuetype lib.Vector3f lib.Vector3f::op_Multiply(valuetype lib.Vector3f&,
float64&)
IL_0007: ret
} // end of method Vector3f::op_Multiply
There is, of course, an increase in the total number of IL operations executed, and if this is what you want to avoid, you should have the same code executed in both of your operators.
You can also try having a Multiply(Vector3f v, double d)
method, decorate it with [MethodImpl(MethodImplOptions.AggressiveInlining)]
and call this method from both operators, -- and hope for the best. It will not be in the IL, but JIT will probably inline the Multiply() code.
Maybe masters will have more to say on this.