Search code examples
rdplyrconditional-statementssurvival-analysissummarize

How can I combine the summarize with multiple ifelse conditions with dplyr in R?


My ecological survival data looks like this:

df <- data.frame (
  ID = c(1,1,1,2,2,2,3,3,3),
  Timepoint = c(1,2,3,1,2,3,1,2,3),
  Days = c(0,22,198,0,21,199,0,23,197),
  Status = c("Alive","Dead","Dead","Alive","Alive","Missing","Alive","Alive","Alive"))

I would like it to be summarised into one row per ID, with the following conditions: If Status changes to Dead, Days becomes the middle value between this timepoint and the last timepoint it was recorded as Alive. If Status changed to Missing, Days becomes the value of last timepoint where the Status was Alive. If Status stays Alive till last timepoint, Days becomes the value of the last timepoint. Note: All IDs start out as alive and stay alive or change to dead or missing and then stay in that category. If it's possible to also create a new column were all IDs changed to Dead get a 1, and those that stayed Alive or went Missing receive a 0 this would be ideal.

Example of new data frame:

ID SurvAge Event
1 11 1
2 21 0
3 197 0

I tried the following code but can't get it to work and would really appreciate some help!

data2 = data %>%
  group_by (ID) %>%
  summarize(SurvAge =
  if_else(!is.na(match(Status, "Missing")),
  Days[which(Status="Alive", last())],
  if_else(!is.na(match(Status,"Dead")),
  mean(Days[which(Status="Alive",last()):which(Status="Dead", first)])),
  if_else(Days[which(Status="Alive", last())])),
  Event=(sum(match(Status, "Dead"), na.rm = TRUE) == 1))`

data2 = data %>%
 group_by (ID) %>%
 summarize(SurvAge = 
    if(Timepoint == 2 & Status== "Missing")
      {Days[which(data$Status =="Alive", last())]}
    else if (Timepoint == 2 & Status=="Dead")
      {mean(Days[which(Status="Alive",last()):which(Status="Dead", first)])}
    else if(Timepoint == 3 & Status== "Missing")
    {Days[which(data$Status =="Alive", last())]}
    else if (Timepoint == 3 & Status=="Dead")
    {mean(Days[which(Status="Alive",last()):which(Status="Dead", first)])}
    else {Days(max())})

Solution

  • Using the sample data, the first row of new data frame can't be 110, it should be 11. I tried to use the code you provide and come up with a solution,

    library(dplyr)
    library(magrittr)
    
    df <- data.frame (
      ID = c(1,1,1,2,2,2,3,3,3),
      Timepoint = c(1,2,3,1,2,3,1,2,3),
      Days = c(0,22,198,0,21,199,0,23,197),
      Status = c("Alive","Dead","Dead","Alive","Alive","Missing","Alive","Alive","Alive"))
    
    newdf <- df %>%
      group_by(ID) %>%
      summarize(Event = as.numeric("Dead"%in%Status))
    
    newdf$SurvAge <- sapply(unique(df$ID),
           function(i){
             df%>%filter(ID==i)%>%
               summarise(Q= case_when(Status=="Alive" ~ max(Days[which(Status=="Alive")]),
                                      Status=="Missing" ~ Days[which(Status=="Alive")%>%last],
                                      Status=="Dead" ~ tryCatch(mean(Days[last(which(Status=="Alive")):first(which(Status=="Dead"))]),
                                                                error=function(e) 0)
                                      ))%>% slice_tail(n=1)
             })%>% unlist
    

    This works with the assumption that every ID include at least one status "Alive". I used the tryCatch function because a condition is used in the value, not all ID have a status "Dead".