Search code examples
rstatistics-bootstrap

obtain bootstrap correlation interval between one variable and all others in data frame


I would like to get bootstrap correlation parameters (interval and rho) between one variable (v) and all others (x,y,z). If I was only interested in the correlation itself (not boostrap) I would use the following formula.

df[,-1] %>% map(.,~cor.test(df$v, .))

How can we use a similar idea with bootstrap?

v<-rnorm(20,50,2)
x<-rnorm(20,2)
y<-rnorm(20,10,2)
z<-rnorm(20,10,2)

df<-data.frame(v=v,x=x,y=y,z=z)

Solution

  • Write a function to run the tests like in the one run case. Then extract the estimate of rho and the confidence intervals. The function returns these values.

    Use package boot to run the above R times, in the code below 5 times.

    set.seed(2022)
    v<-rnorm(20,50,2)
    x<-rnorm(20,2)
    y<-rnorm(20,10,2)
    z<-rnorm(20,10,2)
    
    df<-data.frame(v=v,x=x,y=y,z=z)
    
    library(magrittr)
    library(boot)
    
    boot_cor <- function(data, i) {
      d <- data[i, ]
      cor_list <- d[-1] %>% purrr::map(~cor.test(v, .))
      pval <- sapply(cor_list, `[[`, 'estimate')
      conf.int <- sapply(cor_list, `[[`, 'conf.int')
      out <- rbind(pval, conf.int)
      out
    }
    
    b <- boot(df, boot_cor, R = 5)
    b$t
    #>            [,1]        [,2]      [,3]        [,4]        [,5]       [,6]        [,7]       [,8]      [,9]
    #> [1,] 0.10901729 -0.35040802 0.5261551  0.47729667  0.04408784 0.75941797 -0.28443711 -0.6456858 0.1808467
    #> [2,] 0.17520115 -0.28978685 0.5732758 -0.01138502 -0.45163044 0.43331887  0.20238775 -0.2637551 0.5918977
    #> [3,] 0.45951068  0.02132649 0.7496046 -0.42373326 -0.72947047 0.02312342 -0.09795514 -0.5180212 0.3601784
    #> [4,] 0.08274794 -0.37344763 0.5067140 -0.26406702 -0.63265841 0.20206624 -0.32949464 -0.6737737 0.1323194
    #> [5,] 0.18779791 -0.27781014 0.5819556  0.11815477 -0.34226148 0.53281673  0.16289648 -0.3013469 0.5647101
    
    # maybe more readable
    array(b$t, dim = c(5, 3, 3),
          dimnames = list(NULL, c("rho", "lower", "upper"), names(df[-1])))
    #> , , x
    #> 
    #>             rho       lower     upper
    #> [1,] 0.10901729 -0.35040802 0.5261551
    #> [2,] 0.17520115 -0.28978685 0.5732758
    #> [3,] 0.45951068  0.02132649 0.7496046
    #> [4,] 0.08274794 -0.37344763 0.5067140
    #> [5,] 0.18779791 -0.27781014 0.5819556
    #> 
    #> , , y
    #> 
    #>              rho       lower      upper
    #> [1,]  0.47729667  0.04408784 0.75941797
    #> [2,] -0.01138502 -0.45163044 0.43331887
    #> [3,] -0.42373326 -0.72947047 0.02312342
    #> [4,] -0.26406702 -0.63265841 0.20206624
    #> [5,]  0.11815477 -0.34226148 0.53281673
    #> 
    #> , , z
    #> 
    #>              rho      lower     upper
    #> [1,] -0.28443711 -0.6456858 0.1808467
    #> [2,]  0.20238775 -0.2637551 0.5918977
    #> [3,] -0.09795514 -0.5180212 0.3601784
    #> [4,] -0.32949464 -0.6737737 0.1323194
    #> [5,]  0.16289648 -0.3013469 0.5647101
    

    Created on 2022-06-04 by the reprex package (v2.0.1)