Search code examples
rforeachcorrelation

How do I use cor.test within foreach in R?


I'm not sure how to correct the code to run a cor.test within foreach. The code I have tried is below (prac3 is the name of the data frame I am working with). The variables I want to correlate are in columns.

cor.res<- foreach(i=seq_len(ncol(prac3)) %dopar% {
 d <- prac3[,i]
  correlation<- cor.test(d$x, d$y, method = "spearman")
  out <- c(correlation$estimate,correlation$p.value)
 }


 cor.res<- foreach(i=seq_len(ncol(prac3)) %dopar% {
  correlation<- cor.test(prac3[,i], prac3, method = "spearman")
  out <- c(correlation$estimate,correlation$p.value)
     }


 cor.res<- foreach(i=seq_len(ncol(prac3)) %dopar% {
 correlation<- cor.test(prac3[[i[1]]], prac3[[i[2]]], method = "spearman")
 out <- c(correlation$estimate,correlation$p.value)
 }

Any help would be appreciated.


Solution

  • This is how I would do it. The issue is that cor.test() wants two vectors. So, rather than looping over a single variable (I) you probably want to get all pairs of variables. That said, foreach loops over a single index. What I would do is to first make a data frame that has all the combinations first:

    library(dplyr)
    library(foreach)
    data(mtcars)
    mtcars <- mtcars[,1:7]
    eg <- expand.grid(row = 1:ncol(mtcars), col = 1:ncol(mtcars))
    eg <- eg %>% filter(row < col)
    

    Next, you can change the numbers to variable names in the following way:

    eg$row <- names(mtcars)[eg$row]
    eg$col <- names(mtcars)[eg$col]
    

    Now, you need to loop over all the rows of eg, where eg[i,1] will stand in for the first variable name and eg[i,2] will stand in for the second variable name. We can use tidy() from the broom package to clean the results up a bit and then combine them with rbind:

    cor.res<- foreach(i=1:nrow(eg), .combine = rbind) %dopar% {
      broom::tidy(cor.test(mtcars[, eg[i,1]], 
                           mtcars[, eg[i,2]], 
                           method="spearman")) %>% 
        mutate(row=eg[i,1], col=eg[i,2])
    }
    
    cor.res
    #> # A tibble: 21 × 7
    #>    estimate statistic  p.value method                        alter…¹ row   col  
    #>       <dbl>     <dbl>    <dbl> <chr>                         <chr>   <chr> <chr>
    #>  1   -0.911    10425. 4.69e-13 Spearman's rank correlation … two.si… mpg   cyl  
    #>  2   -0.909    10415. 6.37e-13 Spearman's rank correlation … two.si… mpg   disp 
    #>  3    0.928      395. 2.28e-14 Spearman's rank correlation … two.si… cyl   disp 
    #>  4   -0.895    10337. 5.09e-12 Spearman's rank correlation … two.si… mpg   hp   
    #>  5    0.902      536. 1.87e-12 Spearman's rank correlation … two.si… cyl   hp   
    #>  6    0.851      813. 6.79e-10 Spearman's rank correlation … two.si… disp  hp   
    #>  7    0.651     1902. 5.38e- 5 Spearman's rank correlation … two.si… mpg   drat 
    #>  8   -0.679     9160. 1.94e- 5 Spearman's rank correlation … two.si… cyl   drat 
    #>  9   -0.684     9186. 1.61e- 5 Spearman's rank correlation … two.si… disp  drat 
    #> 10   -0.520     8294. 2.28e- 3 Spearman's rank correlation … two.si… hp    drat 
    #> # … with 11 more rows, and abbreviated variable name ¹​alternative
    

    Created on 2023-02-23 with reprex v2.0.2