Search code examples
rnormal-distribution

Is there a way to apply shapiro.test() to a column based on specific category of another column?


I have a data named df as you can see in the picture. I want to apply shapiro test for the column "value" but based on different color category. Here below you can see my code which is giving the following error "Caused by error:! shapiro.test(value) must be a vector, not a object.". I would appreriate your suggestions.

 df %>%
  group_by(color) %>%
  summarise(shapiro.test(value))

enter image description here


Solution

  • Users on this site seems to be overusing tidyverse solutions when there are simple base R solutions. Here is one, with some simulated data:

    df <- data.frame(value=rnorm(200), color=c(rep("blue", 100), rep("red", 100)))
    
    with(df, tapply(value, color, shapiro.test))
    $blue
    
        Shapiro-Wilk normality test
    
    data:  X[[i]]
    W = 0.98655, p-value = 0.4078
    
    
    $red
    
        Shapiro-Wilk normality test
    
    data:  X[[i]]
    W = 0.98544, p-value = 0.3417
    

    with and its cousin within are very useful, making for clean code and seems to be underused.

    An alternative syntax using the native pipe is

    df |> with(tapply(value, color, shapiro.test))
    

    resulting in exactly the same output as above.