Search code examples
mql4

How is the Pearson's Correlation calculated in MQL4?


Working on an easy that trades correlating pair( hedge ), I need to code a correlation matrix, like the ones on myfxbook, or Oanda.

The main point is I want to be able to loop through each value in the matrix and check if its greater than 85.0 or so.


Solution

  • Q: How is the Pearson's Correlation calculated in MQL4?

    Method A:
    use the MQL4 to compute the PearsonCorr_r directly:

    If it is enough to work with the precision of double, MQL4 code can implement the process for reasonable-sized vectors of values ( X[], Y[] )

    #define RET_OK    0
    #define RET_ERROR EMPTY
    #define VAL_ERROR EMPTY_VALUE
    
    int   PearsonCorr_r( double const &vectorX[], //   |-> INPUT X[]      = { 1, 3,  5,  5,  6 }
                         double const &vectorY[], //   |-> INPUT Y[]      = { 5, 6, 10, 12, 13 }
                         double       &pearson_r  // <=|   returns RESULT = 0.968
                         ){
          double  sumX = 0,
                 meanX = 0,
                 meanY = 0,
                  sumY = 0,
                 sumXY = 0,
                 sumX2 = 0,
                 sumY2 = 0;
              // deviation_score_x[],               // may be re-used for _x^2
              // deviation_score_y[],               // may be re-used for _y^2
              // deviation_score_xy[];
    /* =====================================================================
                      DEVIATION SCORES                                       >>> http://onlinestatbook.com/2/describing_bivariate_data/calculation.html
            X[]  Y[]  x      y      xy    x^2    y^2
            1    4   -3     -5      15    9     25
            3    6   -1     -3       3    1      9
            5   10    1      1       1    1      1
            5   12    1      3       3    1      9
            6   13    2      4       8    4     16
           _______________________________________
    
    SUM    20   45    0      0      30   16     60
    MEAN    4    9    0      0       6   
    
           r = SUM(xy) / SQRT(  SUM( x^2 ) * SUM( y^2 ) )
           r =      30 / SQRT( 960 )
           r = 0.968
       =====================================================================
                                                                            */
          int    vector_maxLEN = MathMin( ArrayRange( vectorX, 0 ),
                                          ArrayRange( vectorY, 0 )
                                          );
    
          if (   vector_maxLEN == 0 ){
                 pearson_r = VAL_ERROR;          // STOR VAL ERROR IN RESULT
                 return(     RET_ERROR );        // FLAG RET_ERROR in JIT/RET
          }
          for ( int jj = 0; jj < vector_maxLEN; jj++ ){
                sumX += vectorX[jj];
                sumY += vectorY[jj];
          }
          meanX = sumX / vector_maxLEN;          // DIV!0 FUSED
          meanY = sumY / vector_maxLEN;          // DIV!0 FUSED
    
          for ( int jj = 0; jj < vector_maxLEN; jj++ ){
             // deviation_score_x[ jj] =   meanX - vectorX[jj];  // 
             // deviation_score_y[ jj] =   meanY - vectorY[jj];
             // deviation_score_xy[jj] = deviation_score_x[jj]
             //                        * deviation_score_y[jj];
             //              sumXY    += deviation_score_x[jj]
             //                        * deviation_score_y[jj];
                             sumXY    += ( meanX - vectorX[jj] ) // PSPACE MOTIVATED MINIMALISTIC WITH CACHE-BENEFITS IN PROCESSING
                                       * ( meanY - vectorY[jj] );
             // deviation_score_x[jj] *= deviation_score_x[jj];  // PSPACE MOTIVATED RE-USE, ROW-WISE DESTRUCTIVE, BUT VALUE WAS NEVER USED AGAIN
             //              sumX2    += deviation_score_x[jj]
             //                        * deviation_score_x[jj];
                             sumX2    += ( meanX - vectorX[jj] ) // PSPACE MOTIVATED MINIMALISTIC WITH CACHE-BENEFITS IN PROCESSING
                                       * ( meanX - vectorX[jj] );
             // deviation_score_y[jj] *= deviation_score_y[jj];  // PSPACE MOTIVATED RE-USE, ROW-WISE DESTRUCTIVE, BUT VALUE WAS NEVER USED AGAIN
             //              sumY2    += deviation_score_y[jj]
             //                        * deviation_score_y[jj];
                             sumY2    += ( meanY - vectorY[jj] ) // PSPACE MOTIVATED MINIMALISTIC WITH CACHE-BENEFITS IN PROCESSING
                                       * ( meanY - vectorY[jj] );
          }
          pearson_r = sumXY
                    / MathSqrt( sumX2
                              * sumY2
                                );            // STOR RET VALUE IN RESULT
          return( RET_OK );                   // FLAG RET_OK in JIT/RET
    

    Method B:
    re-use external Libs having Pearson Correlation in R, MATLAB et al:

    One may use a distributed processing using a for example a ZeroMQ messaging infrastructure to request the calculus to be performed outside of the MQL4 / independently from the localhost processing.

    If interested, read my other posts on distributed processes in MQL4 ( a code-example -- just to have some feeling of how the MQL4 side gets setup -- could be found here ) and MATLAB( a code-example of the ZeroMQ-infrastructure setup could be found here

    thus allowing to use the MATLAB built-in implementation of Pearson correlation ( remember to properly pre-format data into columns and best if added also a DIV!0-fusing ), to compute:

    [ RHO, PVAL ] = corr( vectorX, vectorY, 'type', 'Pearson' );
                                                   % note: double-r in corr() 
                                                   %            # 'Pearson' is default method
    

    Similarly an R-language has a built-in tool:

    corr_r <- cor( vecORmatX, vecORmatY, use = "everything", method = "pearson" )
                                                                # "Pearson" is default method
    

    Last but not least is a python scipy.stats.stats pearsonr-implementation as a tool, with both float32 and float64 precisions:

    >>> from scipy.stats.stats import pearsonr as pearson_r
    >>>
    >>> X = np.zeros( (5,), dtype = np.float32 )
    >>> Y = np.zeros( (5,), dtype = np.float32 )
    >>>
    >>> X[0] =  1; X[1] = 3; X[2] =  5; X[3] =  5; X[4] =  6
    >>> Y[0] =  5; Y[1] = 6; Y[2] = 10; Y[3] = 12; Y[4] = 13
    >>>
    >>> pearson_r( X, Y)
    (0.94704783, 0.01451040731338055)
    >>>
    >>> X = np.zeros( (5,), dtype = np.float64 )
    >>> Y = np.zeros( (5,), dtype = np.float64 )
    >>>
    >>> X[0] =  1; X[1] = 3; X[2] =  5; X[3] = 5; X[4] = 6
    >>> Y[0] = 5; Y[1] = 6; Y[2] = 10; Y[3] = 12; Y[4] = 13
    >>>
    >>> pearson_r( X, Y)
    (0.94704783738690446, 0.014510403904375592)
    >>>
    

    Epilogue:
    Method A yields results == python.scipy.stats.stats.pearsonr(X,Y)
    ( i.e. the cited onlinestatbook.com result is inaccurate )

    2016.10.13 11:31:55.421 ___StackOverflow_Pearson_r_DEMO XAUUSD,H1:
                               PearsonCorr_r( testX, testY, Pearson_r ):= 0.968
                               The actual call returned    aReturnCODE == 0,
                                      whereas the          Pearson_r   == 0.9470