Search code examples
c#regressionmath.netmathnet-numerics

C# Can LinearRegression code from Math.NET Numerics be made faster?


I need to do multiple linear regression efficiently. I am trying to use the Math.NET Numerics package but it seems slow - perhaps it is the way I have coded it? For this example I have only simple (1 x value) regression.

I have this snippet:

        public class barData
        {
            public double[] Xs;
            public double Mid;
            public double Value;

        }

        public List<barData> B;


        var xdata = B.Select(x=>x.Xs[0]).ToArray();
        var ydata = B.Select(x => x.Mid).ToArray();

        var X = DenseMatrix.CreateFromColumns(new[] { new DenseVector(xdata.Length, 1), new DenseVector(xdata) });
        var y = new DenseVector(ydata);

        var p = X.QR().Solve(y);
        var b = p[0];
        var a = p[1];
        B[0].Value = (a * (B[0].Xs[0])) + b;

This runs about 20x SLOWER than this pure C#:

       double xAvg = 0;
        double yAvg = 0;

        int n = -1;
        for (int x = Length - 1; x >= 0; x--)
        {
            n++;
            xAvg += B[x].Xs[0];
            yAvg += B[x].Mid;
        }

        xAvg = xAvg / B.Count;
        yAvg = yAvg / B.Count;

        double v1 = 0;
        double v2 = 0;

        n = -1;
        for (int x = Length - 1; x >= 0; x--)
        {
            n++;
            v1 += (B[x].Xs[0] - xAvg) * (B[x].Mid - yAvg);
            v2 += (B[x].Xs[0] - xAvg) * (B[x].Xs[0] - xAvg);
        }

        double a = v1 / v2;
        double b = yAvg - a * xAvg;

        B[0].Value = (a * B[Length - 1].Xs[0]) + b;

ALSO if Math.NET is the issue, then if anyone knows simple way to alter my pure code for multiple Xs I would be grateful of some help


Solution

  • Using a QR decomposition is a very generic approach that can deliver least squares regression solutions to any function with linear parameters, no matter how complicated it is. It is therefore not surprising that it cannot compete with a very specific straight implementation (on computation time), especially not in the simple case of y:x->a+b*x. Unfortunately Math.NET Numerics does not provide direct regression routines yet you could use instead.

    However, there are still a couple things you can try for better speed:

    • Use thin instead of full QR decompositon, i.e. pass QRMethod.Thin to the QR method
    • Use our native MKL provider (much faster QR, but no longer purely managed code)
    • Tweak threading, e.g. try to disable multi-threading completely (Control.ConfigureSingleThread()) or tweak its parameters

    If the data set is very large there are also more efficient ways to build the matrix, but that's likely not very relevant beside of the QR (-> perf analysis!).