I wrote the following code to experiment with System.Numerics.Vector4 and evaluate the performance gain:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Numerics;
namespace ConsoleApp8
{
class Program
{
static void Main(string[] args)
{
const int N = 100000000;
long ticks_start, ticks_end;
ticks_start = DateTime.Now.Ticks;
float[] a = { 10, 10, 10, 10 };
float[] b = new float[4];
for (int i = 0; i < N; i++)
for (int j = 0; j < 4; j++)
b[j] = a[j] + a[j];
ticks_end = DateTime.Now.Ticks;
Console.WriteLine($"Done in {ticks_end - ticks_start} ticks");
ticks_start = DateTime.Now.Ticks;
Vector4 result;
Vector4 v = new Vector4();
for (int i = 0; i < N; i++)
{
v.W = a[0];
v.X = a[1];
v.Y = a[2];
v.Z = a[3];
result = Vector4.Add(v, v);
b[0] = result.W;
b[1] = result.X;
b[2] = result.Y;
b[3] = result.Z;
}
ticks_end = DateTime.Now.Ticks;
Console.WriteLine($"Done in {ticks_end - ticks_start} ticks");
Console.ReadKey();
}
}
}
The output is:
Done in 14257591 ticks
Done in 18591588 ticks
So it seems that we get no advantage using Vector4. The Add method returns a new instance of Vector4. Is there a way to mutate one of the vectors to avoid the memory allocation impact? Or maybe there is another way to do things?
I haven't really benchmarked it, but this, in the inner loop:
for (int i = 0; i < N; i++)
{
v.W = a[0];
v.X = a[1];
v.Y = a[2];
v.Z = a[3];
result = Vector4.Add(v, v);
b[0] = result.W;
b[1] = result.X;
b[2] = result.Y;
b[3] = result.Z;
}
Would be more equivalent to:
float[] a = { 10, 10, 10, 10 };
float[] b = new float[4];
float[] v = new float[4];
float[] result = new float[4];
for (int i = 0; i < N; i++)
{
v[0] = a[0];
v[1] = a[1];
v[2] = a[2];
v[3] = a[3];
result[0] = v[0] + v[0];
result[1] = v[1] + v[1];
result[2] = v[2] + v[2];
result[3] = v[3] + v[3];
b[0] = result[0];
b[1] = result[1];
b[2] = result[2];
b[3] = result[3];
}
Than what you've written (you are making 4 assignations -assigning to v
-, then the addition, then another 4 assignations -assigning result
to b
-, which you are just skipping on your array operation).
I just tested it on linqpad using stopwatches (this is by no means a benchmark), and if you do this, it's slower with the array addition (even with the sum unrolled) than it is with Vector4
(by a very tight margin).