Search code examples
c#correlationpearsonpearson-correlation

How to compute Pearson Correlation between 2 given vectors?


I have to code this in C#

Can you explain step by step in the below given example?

vector 1 : [0.3, 0, 1.7, 2.2]
vector 2 : [0, 3.3, 1.2, 0]

Ty very much

This will be used in document clustering


Solution

  • That's adaptation of my answer on Java version

    How to find correlation between two integer arrays in java

    for C#. First, the Pearson Correlation is

    http://en.wikipedia.org/wiki/Correlation_and_dependence

    Providing that both vectors (let them be IEnumerable<Double>) are of same length

      private static double Correlation(IEnumerable<Double> xs, IEnumerable<Double> ys) {
        // sums of x, y, x squared etc.
        double sx = 0.0;
        double sy = 0.0;
        double sxx = 0.0;
        double syy = 0.0;
        double sxy = 0.0;
    
        int n = 0;
    
        using (var enX = xs.GetEnumerator()) {
          using (var enY = ys.GetEnumerator()) {
            while (enX.MoveNext() && enY.MoveNext()) {
              double x = enX.Current;
              double y = enY.Current;
    
              n += 1;
              sx += x;
              sy += y;
              sxx += x * x;
              syy += y * y;
              sxy += x * y;
            }
          }
        }
    
        // covariation
        double cov = sxy / n - sx * sy / n / n;
        // standard error of x
        double sigmaX = Math.Sqrt(sxx / n -  sx * sx / n / n);
        // standard error of y
        double sigmaY = Math.Sqrt(syy / n -  sy * sy / n / n);
    
        // correlation is just a normalized covariation
        return cov / sigmaX / sigmaY;
      }
    

    Test:

      // -0.539354840012899
      Double result = Correlation(
        new Double[] { 0.3, 0, 1.7, 2.2 }, 
        new Double[] { 0, 3.3, 1.2, 0 });