Search code examples
c#performancematrixoptimizationmathnet-numerics

Why is matrix multiplication with MathNet.Numerics slower than regular multiplication?


I’m using the MathNet.Numerics library in my C# application to perform matrix operations. However, I’ve noticed that matrix multiplication using this library is significantly slower compared to regular scalar multiplication or operations I’ve implemented manually.

Here’s a simplified example of the code:

using MathNet.Numerics.LinearAlgebra;

public List<Point3D> ToSomething(
    MathN.Matrix<float> matrixA,
    MathN.Matrix<float> matrixB,
    List<Point3D> points)
{
    MathN.Matrix<float> matrixC = MathN.Matrix<float>.Build.Dense(4, points.Count, 1);
    for (int i = 0; i < points.Count; i++)
    {
        // some code assigning matrixC values
    }

    // Using MathNet.Numerics for matrix multiplication
    var resMat = (matrixA * matrixB) * matrixC;   // Takes significantly more time

    var ptsInPcs = new List<Point3D>(resMat.ColumnCount);
    
    // some other code

    return ptsInPcs;
}

I expected the MathNet.Numerics library to perform matrix multiplication efficiently, but it appears to have significant overhead compared to a basic custom implementation.

For instance, when using a basic nested loop approach like this:

// Custom matrix multiplication method for 1D arrays
public List<Point3D> ToSomethingFlatArray(
    float[] matrixA, 
    float[] matrixB,
    List<Point3D> point)
{
    int numPoints = point.Count;

    float[] tempMatrixA = new float[4 * numPoints];
    float[] tempMatrixB = new float[4 * numPoints];
    float[] tempMatrixC = new float[4 * 4];

    for (int i = 0; i < numPoints; i++)
    {
        // some code
    }

    MatrixMultiply(matrixA, 4, 4, matrixB, 4, 4, tempMatrixC);
    MatrixMultiply(tempMatrixC, 4, 4, tempMatrixA, 4, numPoints, tempMatrixB);

    var ptsInPcs = new List<Point3D>(numPoints);

    // some other code

    return ptsInPcs;
}

static void MatrixMultiply(float[] A, int aRows, int aCols, float[] B, int bRows, int bCols, float[] result)
{
    if (aCols != bRows)
        throw new ArgumentException("Matrix dimensions are not compatible for multiplication.");

    for (int i = 0; i < aRows; i++)
    {
        int aRowOffset = i * aCols;
        int resultRowOffset = i * bCols;
        for (int j = 0; j < bCols; j++)
        {
            float sum = 0;
            for (int k = 0; k < aCols; k++)
            {
                sum += A[aRowOffset + k] * B[k * bCols + j];
            }
            result[resultRowOffset + j] = sum;
        }
    }
}

Here are some benchmarking results: benchmarking results

Questions:

Is this a known issue with MathNet.Numerics? Are there optimization techniques or settings in MathNet that can improve matrix multiplication performance? Would a different library or custom implementation offer better performance for matrix operations in C#?


Solution

  • For small matrices, like 4x4, the fastest option is just to list all the operations to perform, ideally using SIMD instructions. That way you avoid any branches, loops etc. Since 4x4, 3x3 and 3x2 matrices are super common in graphics these often have specialized implementations. Math.Net have Matrix3D, but I'm not sure how well optimized it is. There is also System.Numerics.Matrix4x4 that I beleive is SIMD optimized, but uses float instead of double. See the Matrix4x4.Multiply source, just a long list of instructions.

    Things become very different for large matrices, and this is the use case Math.net Matrix<T> is optimized for. When multiplying It may consider several factors, for example:

    1. Can it use a GPU to speed up the multiplication? Would it be worth the overhead of transferring data to the GPU?
    2. Can an optimized BLAS package, like intel MKL, be used to optimize multiplication?
    3. What types of matrices are multiplied? Dense? Sparse? What algorithm should be used for each?
    4. How to optimize cache usage? Matrix multiplication optimization mostly about reducing unnecessary memory traffic. But it makes the code significantly longer, and may require allocation of some temporary memory.

    Checking each of these things takes some time. For small matrices, like 4x4, this time will likely be much longer than the actual multiplication time, but for large matrices the benefits can be huge.

    So you should use the appropriate tool for your problem. If you are doing large optimization problems, use Math.Net.Matrix. If you are doing a large number of 3D transforms, use Numerics.Matrix4x4, Math.Net Matrix3D, a custom implementation, or one of the many other libraries.