Note: I attempted to solve this using In R, correlation test between two columns, for each of the groups in a third column but was not successful
I have the following data
> x = data.frame(year = c(2019, 2019, 2020, 2020, 2021, 2021, 2022, 2022, 2023, 2023),
group = rep(c("A", "B"), 5),
y = runif(10))
> x
year group y
1 2019 A 0.26550866
2 2019 B 0.37212390
3 2020 A 0.57285336
4 2020 B 0.90820779
5 2021 A 0.20168193
6 2021 B 0.89838968
7 2022 A 0.94467527
8 2022 B 0.66079779
9 2023 A 0.62911404
10 2023 B 0.06178627
I would like to do correlation tests between year
and variable y
for each group. I can achieve this individually if I do
> A = x %>% filter(group == "A")
> B = x %>% filter(group == "B")
>
> cor.test(A$year, A$y)
Pearson's product-moment correlation
data: A$year and A$y
t = 0.67259, df = 3, p-value = 0.5494
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.7644073 0.9430671
sample estimates:
cor
0.3619872
> cor.test(B$year, B$y)
Pearson's product-moment correlation
data: B$year and B$y
t = 1.9909, df = 3, p-value = 0.1406
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.3822520 0.9826436
sample estimates:
cor
0.7544519
I'm trying to do this with a group_by
statement to generalize for the situation where I have many groups
My (unsuccessful) attempts is
> # Unsuccessful attempt 1
>
> x %>% group_by(group) %>%
group_map(~cor.test(x$year, x$y))
[[1]]
Pearson's product-moment correlation
data: x$year and x$y
t = 0.15447, df = 8, p-value = 0.8811
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.5955411 0.6614485
sample estimates:
cor
0.05453354
[[2]]
Pearson's product-moment correlation
data: x$year and x$y
t = 0.15447, df = 8, p-value = 0.8811
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.5955411 0.6614485
sample estimates:
cor
0.05453354
This is not correct.
> # Unsuccessful attempt 2
> # (using https://stackoverflow.com/questions/14030697/in-r-correlation-test-between-two-columns-for-each-of-the-groups-in-a-third-co)
> library(plyr)
> daply(x, .(group), function(y) cor.test(y$year, y$y))
> # Error message
Is there a way to achieve obtain a list correlation tests for each group?
You can use this simple code:
library(dplyr)
set.seed(1234)
df <- data.frame(year = c(2019, 2019, 2020, 2020, 2021, 2021, 2022, 2022, 2023, 2023),
group = rep(c("A", "B"), 5),
y = runif(10))
test_list <- df %>%
group_by(group) %>%
summarize(cor_test=list(cor.test(year, y)))
You can retrieve the results of cor.test
for groups A and B in the following way:
test_list$cor_test[[1]]
Pearson`s product-moment correlation
data: year and y
t = 0.38263, df = 3, p-value = 0.7275
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.8232277 0.9224262
sample estimates:
cor
0.2157109
and
test_list$cor_test[[2]]
Pearson`s product-moment correlation
data: year and y
t = -1.1663, df = 3, p-value = 0.3278
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.9651836 0.6382303
sample estimates:
cor
-0.558549