Search code examples
rdplyrpurrrmagrittr

Use function over groups or factors with dplyr


I want to use a function, e.g. shapiro.test() over several groups in a dataset.

First I tried

library(tidyverse)
library(magrittr)

mtcars %>% group_by(cyl) %$% shapiro.test(wt)$p.value
#> [1] 0.09265499

But that did not iterate over the groups as I expected. Then I tried a function that would output the results as a dataframe, as that was the approach taken for another question here on Stack Overflow.

checkNorm <- function(x) {
  return(data.frame(P = shapiro.test(x)$p.value))
}

mtcars %>% group_by(cyl) %$% checkNorm(wt)
#>            P
#> 1 0.09265499

What is the appropriate way to make functions iterate over the groups passed by group_by()?


Solution

  • Create a new column to store p-value of each group :

    library(dplyr)
    
    mtcars %>% 
      group_by(cyl) %>%
      summarise(p_val = shapiro.test(wt)$p.value)
    
    #   cyl   p_val
    #  <dbl>   <dbl>
    #1     4 0.570  
    #2     6 0.131  
    #3     8 0.00275