I am just start learning R for data analysis. Here is my problem.
I want to analyse the body weight(BW) difference between male and female in different species. (For example, in Sorex gracilliums, male and female body weight is significantly different just an example,I don't know the answer. :))At first I thought maybe I can first divide them by Species into several groups.(This indeed can be done in Excel, but I have tooo many files, I think maybe R is better ) And then I can just using some simple code to test sex difference. But I don't know how to divide them, how to make new data frame.. I tried to use group_split. It indeed split the data, but just many tribble. like image showed
What should I do? Or maybe there is a better way for testing the difference?
I am a foreigner,so maybe there are many grammar mistakes.. But I will be very appreciated if you help!
Assuming your data is in a data.frame called df, with columns NO, SPECIES, SEX, BW:
set.seed(100)
df = data.frame(NO=1:100,
SPECIES=sample(LETTERS[1:4],100,replace=TRUE),
SEX=sample(c("M","F"),100,replace=TRUE),
BW = rnorm(100,80,2)
)
And we make Species D to have an effect:
df$BW[df$SPECIES=="D" & df$SEX=="M"] = df$BW[df$SPECIES=="D" & df$SEX=="M"] + 5
If we want to do it on one data frame, say Species A, we do
dat = subset(df,SPECIES=="A")
t.test(BW ~ SEX,data=dat)
And you get the relevant statistics and so forth. To do this systematically for all SPECIES, we can use broom, dplyr:
library(dplyr)
library(broom)
df %>% group_by(SPECIES) %>% do(tidy(t.test(BW ~ SEX,data=.)))
# A tibble: 4 x 11
# Groups: SPECIES [4]
SPECIES estimate estimate1 estimate2 statistic p.value parameter conf.low
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 0.883 80.4 79.6 0.936 3.65e-1 14.2 -1.14
2 B 0.259 80.2 79.9 0.377 7.12e-1 14.1 -1.21
3 C 0.170 80.1 79.9 0.359 7.23e-1 25.3 -0.807
4 D -5.55 79.7 85.2 -7.71 1.29e-7 21.4 -7.05
If you don't want to install any packages, this will give you all the test results:
by(df, df$SPECIES, function(x)t.test(BW ~ SEX,data=x))
And combining them into one data.frame:
func = function(x){
Nu=t.test(BW ~ SEX,data=x);
data.frame(estimate_1=Nu$estimate[1],estimate_2=Nu$estimate[2],p=Nu$p.value)}
do.call(rbind,by(df, df$SPECIES,func))