Search code examples
rperformancecorrelationpearson-correlationpearson

Is Pearson correlation faster than Spearman correlation in R?


I would like to determine many correlations (millions) between pairs of columns, so I am worried about computing time.

I suspect that Pearson correlations (based on values) are faster to calculate in R than Spearman correlations (based on ranks). Is that correct?

How can I find out, please? Thank you.


Solution

  • You can use the rbenchmark package for this.

    library(rbenchmark)
    

    1.000 rows, 100 repetitions

    x1 <- rnorm(1000)
    y1 <- rnorm(1000)
    
    benchmark(spearman = {
      cor(x1, y1, method = "spearman")
    },
    pearson = {
      cor(x1, y1, method = "pearson")
    },
    replications = 100)
    #>       test replications elapsed relative user.self sys.self user.child
    #> 2  pearson          100   0.002        1     0.002        0          0
    #> 1 spearman          100   0.014        7     0.013        0          0
    #>   sys.child
    #> 2         0
    #> 1         0
    

    1.000.000 rows, 100 repititions

    x2 <- rnorm(1000000)
    y2 <- rnorm(1000000)
    
    benchmark(spearman = {
      cor(x2, y2, method = "spearman")
    },
    pearson = {
      cor(x2, y2, method = "pearson")
    },
    replications = 100)
    #>       test replications elapsed relative user.self sys.self user.child
    #> 2  pearson          100   0.717    1.000     0.717    0.001          0
    #> 1 spearman          100  37.336   52.073    36.797    0.537          0
    #>   sys.child
    #> 2         0
    #> 1         0
    

    This confirms you assumption: Pearson is significantly faster than Spearman. Especially when the rows/cases are increased, Spearman becomes slow.