Search code examples
rcorrelationstatistics-bootstrap

Bootstrapped correlation with more than 2 variables in R


I am trying to calculate a bootstraped correlation between six variables in R. But all the examples, solutions and tutorials that I find are made for two variables. I adapted, but i am not sure if i am getting a correct output.

Let's say that this is my data:

dado <- tibble(var1 = rnorm(104, mean = 7, sd = 1.5), var2 = rnorm(104, mean = 2.88, sd = 1.12),
               var3 = rnorm(104, mean = 1.55, sd = 0.8), var4 = rnorm(104, mean = 3.52, sd = 1.2),
               var5 = rnorm(104, mean = 2.67, sd = 0.94), var6 = rnorm(104, mean = 2.33, sd = 1.45))

I tried using the following code to bootstrap, but the output is not clear.

foo.matriz <- function(data, indices, cor.type = "pearson"){
        dt<-data[indices,]
        cor(dt, method = cor.type)
}

boot_strap <- boot(data = dado, statistic = foo.matriz, R = 1000)

Should I interpret this as: first line is equal to the correlation of the first variable with itself; second line is the correlation of the first variable with the second variable; and so on? When the number 1 appears again, the cycle starts again with the second variable?


Solution

  • The answer is Yes. But the it likely doesn't matter. If you go by row or by column in your 6x6 correlation matrix, you will end up with the same result as the upper and lower triangle are symmetrical.

    If you are unsure how a matrix reshapes to a vector, consider this:

    
    m <- matrix( 1:36, nrow=6 )
    print(m)
    
    as.numeric( m )
    ## 1 2 3 4 5 6 7 etc... indicating that the default order is by column from left to right.
    
    

    I could not comfortably comment on the soundness of your method though, but the results are what you suspected.