A newbie in R. Considering this is my situation:(Actually my real situation is much more complex)
set.seed(100)
df = data.frame(SEX=sample(c("M","F"),100,replace=TRUE),BW = rnorm(100,80,2))
One column is SEX(male and female), another one is BW(body weight).
I want to test male's body weight normality and female's body weight normality. Then I can test equlity of variances respectively. At last, T test or other test for this situation.
But shapiro.test
can't be used in this situation. (like shapiro.test(BW~SEX,data=df)
)
What should I do? I don't want to seperate the data frame or make new subsets.
Thanks in advance~!
A "tidyverse" solution to this problem is described in detail here: Running a model on separate groups.
Briefly, using your data:
library(dplyr) # for mutate
library(tidyr) # for nest/unnest
library(purrr) # for map
library(broom) # for glance
df %>%
nest(data = c(BW)) %>%
mutate(model = map(data, ~ shapiro.test(.x$BW)),
g = map(model, glance)) %>%
unnest(g)
Result:
# A tibble: 2 x 6
SEX data model statistic p.value method
<fct> <list<df[,1]>> <list> <dbl> <dbl> <chr>
1 F [50 x 1] <htest> 0.982 0.639 Shapiro-Wilk normality test
2 M [50 x 1] <htest> 0.980 0.535 Shapiro-Wilk normality test