Search code examples
rmodelsurvival-analysis

Convert Survival Fraction data to Binomial count data in R?


I have a data set that includes the amount of individuals from different genetic lines (line) of fruit flies in the study (n) and the amount that survived (alive). This is broken up into replicates (rep) That data frame looks like so:

    line rep  n   alive    trt
1    21   1   5   2        control
2    21   2   5   4        control
3    26   1   5   1        control
4    26   2   5   4        control

In order to fit a binomial model, I want to convert the fraction (alive/n) to count data. So far I have been doing this manually (which is very painstaking) creating a dataframe like this:

    line  rep trt        surv
1     21   1  control    0
2     21   1  control    0
3     21   1  control    0
4     21   1  control    1
5     21   1  control    1
6     21   2  control    0
7     21   2  control    1
8     21   2  control    1
9     21   2  control    1
10    21   2  control    1
11    26   1  control    0
12    26   1  control    0
13    26   1  control    0
14    26   1  control    0
15    26   1  control    1
16    26   2  control    0
17    26   2  control    1
18    26   2  control    1
19    26   2  control    1
20    26   2  control    1

This allows me to create a model where survival is the response variable, the interaction between line and treatment (trt) is a major effect and rep is a random effect. The model works, the issue is how much time to takes to generate this (I have a total of 139 lines with 5 reps each). Can someone please help me either create a function, show me a function or a package that will help me? is there an easier way to do this?

Thanks in advance,

Phil


Solution

  • With your sample data

    dd<-read.table(text="    line rep  n   alive    trt
    1    21   1   5   2        control
    2    21   2   5   4        control
    3    26   1   5   1        control
    4    26   2   5   4        control", header=T)
    

    You can use dplyr and tidyr to help...

    library(dplyr) library(tidyr)

    dd %>% mutate(dead=n-alive) %>% select(-n) %>% 
        gather(status, count, c(alive,dead)) %>% 
        slice(rep(1:n(), .$count)) %>% select(-count) %>% 
        transform(surv=ifelse(status=="alive",1,0), status=NULL) %>%
        arrange(line, rep, trt, surv)
    

    We use gather() to create separate rows for the surv=0 and surv=1 and we use slice() to replicate the desired rows.