Search code examples
rnormal-distribution

I have two different classes in one column. How to test normality of each of them?


A newbie in R. Considering this is my situation:(Actually my real situation is much more complex)

set.seed(100)
df = data.frame(SEX=sample(c("M","F"),100,replace=TRUE),BW = rnorm(100,80,2))

One column is SEX(male and female), another one is BW(body weight). I want to test male's body weight normality and female's body weight normality. Then I can test equlity of variances respectively. At last, T test or other test for this situation. But shapiro.test can't be used in this situation. (like shapiro.test(BW~SEX,data=df))

What should I do? I don't want to seperate the data frame or make new subsets.

Thanks in advance~!


Solution

  • A "tidyverse" solution to this problem is described in detail here: Running a model on separate groups.

    Briefly, using your data:

    library(dplyr) # for mutate
    library(tidyr) # for nest/unnest
    library(purrr) # for map
    library(broom) # for glance
    
    df %>% 
      nest(data = c(BW)) %>% 
      mutate(model = map(data, ~ shapiro.test(.x$BW)), 
             g = map(model, glance)) %>% 
      unnest(g)
    

    Result:

    # A tibble: 2 x 6
      SEX             data model   statistic p.value method                     
      <fct> <list<df[,1]>> <list>      <dbl>   <dbl> <chr>                      
    1 F           [50 x 1] <htest>     0.982   0.639 Shapiro-Wilk normality test
    2 M           [50 x 1] <htest>     0.980   0.535 Shapiro-Wilk normality test