Search code examples
rtidyversegroupingcorrelationhypothesis-test

How to perform correlation test for each group in R and store results in a list?


Note: I attempted to solve this using In R, correlation test between two columns, for each of the groups in a third column but was not successful

I have the following data

> x = data.frame(year = c(2019, 2019, 2020, 2020, 2021, 2021, 2022, 2022, 2023, 2023),
                 group = rep(c("A", "B"), 5),
                 y = runif(10))
> x
   year group          y
1  2019     A 0.26550866
2  2019     B 0.37212390
3  2020     A 0.57285336
4  2020     B 0.90820779
5  2021     A 0.20168193
6  2021     B 0.89838968
7  2022     A 0.94467527
8  2022     B 0.66079779
9  2023     A 0.62911404
10 2023     B 0.06178627

I would like to do correlation tests between year and variable y for each group. I can achieve this individually if I do

> A = x %>% filter(group == "A")
> B = x %>% filter(group == "B")
> 
> cor.test(A$year, A$y)

    Pearson's product-moment correlation

data:  A$year and A$y
t = 0.67259, df = 3, p-value = 0.5494
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.7644073  0.9430671
sample estimates:
      cor 
0.3619872

> cor.test(B$year, B$y)

    Pearson's product-moment correlation

data:  B$year and B$y
t = 1.9909, df = 3, p-value = 0.1406
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.3822520  0.9826436
sample estimates:
      cor 
0.7544519 

I'm trying to do this with a group_by statement to generalize for the situation where I have many groups

My (unsuccessful) attempts is

> # Unsuccessful attempt 1
>
> x %>% group_by(group) %>% 
   group_map(~cor.test(x$year, x$y))

[[1]]

    Pearson's product-moment correlation

data:  x$year and x$y
t = 0.15447, df = 8, p-value = 0.8811
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.5955411  0.6614485
sample estimates:
       cor 
0.05453354 


[[2]]

    Pearson's product-moment correlation

data:  x$year and x$y
t = 0.15447, df = 8, p-value = 0.8811
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.5955411  0.6614485
sample estimates:
       cor 
0.05453354

This is not correct.

> # Unsuccessful attempt 2
> # (using https://stackoverflow.com/questions/14030697/in-r-correlation-test-between-two-columns-for-each-of-the-groups-in-a-third-co)
> library(plyr)
> daply(x, .(group), function(y) cor.test(y$year, y$y))
> # Error message

Is there a way to achieve obtain a list correlation tests for each group?


Solution

  • You can use this simple code:

    library(dplyr)
    set.seed(1234)
    df <-  data.frame(year = c(2019, 2019, 2020, 2020, 2021, 2021, 2022, 2022, 2023, 2023),
                      group = rep(c("A", "B"), 5),
                      y = runif(10))
    
    test_list <- df %>% 
                 group_by(group) %>% 
                 summarize(cor_test=list(cor.test(year, y)))
    

    You can retrieve the results of cor.test for groups A and B in the following way:

    test_list$cor_test[[1]]
    
            Pearson`s product-moment correlation
    
    data:  year and y
    t = 0.38263, df = 3, p-value = 0.7275
    alternative hypothesis: true correlation is not equal to 0
    95 percent confidence interval:
     -0.8232277  0.9224262
    sample estimates:
          cor 
    0.2157109 
    

    and

    test_list$cor_test[[2]]
    
            Pearson`s product-moment correlation
    
    data:  year and y
    t = -1.1663, df = 3, p-value = 0.3278
    alternative hypothesis: true correlation is not equal to 0
    95 percent confidence interval:
     -0.9651836  0.6382303
    sample estimates:
          cor 
    -0.558549