I have a data set that includes the amount of individuals from different genetic lines (line) of fruit flies in the study (n) and the amount that survived (alive). This is broken up into replicates (rep) That data frame looks like so:
line rep n alive trt
1 21 1 5 2 control
2 21 2 5 4 control
3 26 1 5 1 control
4 26 2 5 4 control
In order to fit a binomial model, I want to convert the fraction (alive/n) to count data. So far I have been doing this manually (which is very painstaking) creating a dataframe like this:
line rep trt surv
1 21 1 control 0
2 21 1 control 0
3 21 1 control 0
4 21 1 control 1
5 21 1 control 1
6 21 2 control 0
7 21 2 control 1
8 21 2 control 1
9 21 2 control 1
10 21 2 control 1
11 26 1 control 0
12 26 1 control 0
13 26 1 control 0
14 26 1 control 0
15 26 1 control 1
16 26 2 control 0
17 26 2 control 1
18 26 2 control 1
19 26 2 control 1
20 26 2 control 1
This allows me to create a model where survival is the response variable, the interaction between line and treatment (trt) is a major effect and rep is a random effect. The model works, the issue is how much time to takes to generate this (I have a total of 139 lines with 5 reps each). Can someone please help me either create a function, show me a function or a package that will help me? is there an easier way to do this?
Thanks in advance,
Phil
With your sample data
dd<-read.table(text=" line rep n alive trt
1 21 1 5 2 control
2 21 2 5 4 control
3 26 1 5 1 control
4 26 2 5 4 control", header=T)
You can use dplyr
and tidyr
to help...
library(dplyr) library(tidyr)
dd %>% mutate(dead=n-alive) %>% select(-n) %>%
gather(status, count, c(alive,dead)) %>%
slice(rep(1:n(), .$count)) %>% select(-count) %>%
transform(surv=ifelse(status=="alive",1,0), status=NULL) %>%
arrange(line, rep, trt, surv)
We use gather()
to create separate rows for the surv=0
and surv=1
and we use slice()
to replicate the desired rows.