Search code examples
c#arraysperformancepearson-correlation

Performance array multiplication Pearson


I compute Pearson correlation (average user/item rating) many times, using my current code performance is very bad:

public double ComputeCorrelation(double[] x, double[] y, double[] meanX, double[] meanY)
        {
            if (x.Length != y.Length)
                throw new ArgumentException("values must be the same length");

            double sumNum = 0;
            double sumDenom = 0;
            double denomX = 0;
            double denomY = 0;

            for (int a = 0; a < x.Length; a++)
            {
                sumNum += (x[a] - meanX[a]) * (y[a] - meanY[a]);
                denomX += Math.Pow(x[a] - meanX[a], 2);
                denomY += Math.Pow(y[a] - meanY[a], 2);
            }

            var sqrtDenomX = Math.Sqrt(denomX);
            var sqrtDenomY = Math.Sqrt(denomY);

            if (sqrtDenomX == 0 || sqrtDenomY == 0) return 0;

            sumDenom = Math.Sqrt(denomX) * Math.Sqrt(denomY);

            var correlation = sumNum / sumDenom;

            return correlation;
        }

I am using standard Pearson correlation with MathNet.Numerics, but this is modification of standard and it's not possible to use it. Is there a way to speed it up? How can it be optimizied regarding to time complexity?


Solution

  • Adding some on MSE answer -- changing Pow(x,2) to diff*diff is definitely something you want to do, you may also want to avoid unnecessary bound-checking in your inner-most loop. This may be done using pointers in C#.

    Could be done this way:

        public unsafe double ComputeCorrelation(double[] x, double[] y, double[] meanX, double[] meanY)
        {
            if (x.Length != y.Length)
                throw new ArgumentException("values must be the same length");
    
            double sumNum = 0;
            double sumDenom = 0;
            double denomX = 0;
            double denomY = 0;
            double diffX;
            double diffY;
    
            int len = x.Length;
    
            fixed (double* xptr = &x[0], yptr = &y[0], meanXptr = &meanX[0], meanYptr = &meanY[0])
            {
                for (int a = 0; a < len; a++)
                {
                    diffX = (xptr[a] - meanXptr[a]);
                    diffY = (yptr[a] - meanYptr[a]);
                    sumNum += diffX * diffY;
                    denomX += diffX * diffX;
                    denomY += diffY * diffY;
                }
            }
    
            var sqrtDenomX = Math.Sqrt(denomX);
            var sqrtDenomY = Math.Sqrt(denomY);
    
            if (sqrtDenomX == 0 || sqrtDenomY == 0) return 0;
    
            sumDenom = sqrtDenomX * sqrtDenomY;
    
            var correlation = sumNum / sumDenom;
    
            return correlation;
        }