Search code examples
rcorrelationpearson

Adapt method cor.test for each data frame in a list


I want to adapt the method in cor.test in R for each data frame in a list of data frames.

data(iris)
iris.lst <- split(iris[, 1:2], iris$Species)
options(scipen=999)

normality1 <- lapply(iris.lst, function(x) shapiro.test(x[,1]))
p1 <- as.numeric(unlist(lapply(normality1, "[", c("p.value"))))
normality2 <- lapply(iris.lst, function(x)shapiro.test(x[,2]))
p2 <- as.numeric(unlist(lapply(normality2, "[", c("p.value"))))
try <- ifelse (p1 > 0.05 | p2 > 0.05, "spearman", "pearson")

# Because all of them are spearman:
try[3] <- "pearson"
for (i in 1: length(try)){
   results.lst <- lapply(iris.lst, function(x) cor.test(x[, 1], x[, 2], method=try[i]))
   results.stats <- lapply(results.lst, "[", c("estimate", "conf.int", "p.value"))
   stats <- do.call(rbind, lapply(results.stats, unlist))
   stats
}

But it does not compute for each data frame individual cor.test...

cor.test(iris.lst$versicolor[, 1], iris.lst$versicolor[, 2], method="pearson")`
stats
# Should be spearman corr.coefficient but is pearson

Any advice?


Solution

  • Let me check if I understand what you want to achieve. You have a list of data frames and a list of corresponding methods you want to apply (one methood for each dataframe). If my assumpution is correct, then you need to do something like this (instead of your for loop):

    for (i in 1: length(try)){
      results.lst <- cor.test(iris.lst[[i]][, 1], iris.lst[[i]][, 2], method=try[i])
      print(results.lst)
    }
    

    Edit: There are many ways to get your stats, here's one. But first a couple of notes:

    • I would find a way to make sure that I'm using the right method with the right dataset, in what follows I use named lists.
    • As far as I can tell only the "pearson" method has a confidence interval, which we have to deal with when creating the stats, or you can just look at the p-value and estimate.
    • we'll use sapply instead of a for loop to get the stats immediatly as a table, and
    • the function t to transpose the table
    names(try) <- names(iris.lst)
    t(
      sapply(names(try), 
           function(i) {
             result <- cor.test(iris.lst[[i]][, 1], iris.lst[[i]][, 2], method=try[[i]])
             to_return <- result[c("estimate", "p.value")]
             to_return["conf.int1"] <- ifelse(is.null(result[["conf.int"]]), NA, result[["conf.int"]][1])
             to_return["conf.int2"] <- ifelse(is.null(result[["conf.int"]]), NA, result[["conf.int"]][2])
             return(to_return)
             }
           )
      )
    

    output:

               estimate  p.value           conf.int1 conf.int2
    setosa     0.7553375 0.000000000231671 NA        NA       
    versicolor 0.517606  0.0001183863      NA        NA       
    virginica  0.4572278 0.0008434625      0.2049657 0.6525292