Search code examples
rdplyrpearson-correlationsignificance

Calculate significance of correlation in grouped data with dplyr


I have grouped data, for which I would like to test several basic inference statistics.

library(tidyverse)

df <- data.frame(x=runif(50, min = 0, max = 25),y=runif(50, min = 10, max = 25), group=rep(0:1,25))

df %>%
  group_by(group) %>%
  summarize(cor(x,y))

Here I can easily get the correlation, but I also need to check it's statistical significance. Unfortunately options like cor.test does not work in dyplr. Is there an easy workaround?


Solution

  • Could this be what you want?

    df %>%
        group_by(group) %>%
        summarize(cor.test(x,y)[["p.value"]])
    

    The thing is that cor.test() returns a list and not a single value, so you need to pick the element out of the list that you are interested in.