Search code examples
rsurvival-analysis

Creating column of 0 and 1 based on inequalities of three date columns


I would like to create a column of 0s and 1s based on inequalities of three columns of dates.

The idea is the following. If event_date is before death_date or study_over, the the column event should be ==1, if event_date occurs after death_date or study_over, event should be == 0. Both event_date and death_date may contain NAs.

set.seed(1337)
rand_dates <- Sys.Date() - 365:1

df <- 
data.frame(
   event_date = sample(rand_dates, 20),
   death_date = sample(rand_dates, 20),
   study_over = sample(rand_dates, 20)
)

My attempt was the following

eventR <- 
    function(x, y, z){
    if(is.na(y)){
        ifelse(x <= z, 1, 0)
    } else if(y <= z){
        ifelse(x < y, 1, 0)
    } else {
        ifelse(x <= z, 1, 0)
    }
    }

I use it in the following manner

library(dplyr)
df[c(3, 5, 7), "event_date"] <- NA #there are some NA in .$event_date
df[c(3, 4, 6), "death_date"] <- NA #there are some NA in .$death_date

df %>%
mutate(event = sapply(.$event_date, eventR, y = .$death_date, z = .$study_over))
##Error: wrong result size (400), expected 20 or 1
##In addition: There were 40 warnings (use warnings() to see them)

I can't figure out how to do this. Any suggestions?


Solution

  • This would seem to construct a binary column (with NA's where needed) where 1 indicates "event_date is before death_date or study_over" and 0 is used elsewhere. As already pointed out your specification does not cover all cases:

    df$event <- with(df, as.numeric( event_date < pmax( death_date , study_over) ) )
    df