I would like to create a column of 0s and 1s based on inequalities of three columns of dates.
The idea is the following. If event_date
is before death_date
or study_over
, the the column event
should be ==1, if event_date
occurs after death_date
or study_over
, event
should be == 0. Both event_date
and death_date
may contain NAs.
set.seed(1337)
rand_dates <- Sys.Date() - 365:1
df <-
data.frame(
event_date = sample(rand_dates, 20),
death_date = sample(rand_dates, 20),
study_over = sample(rand_dates, 20)
)
My attempt was the following
eventR <-
function(x, y, z){
if(is.na(y)){
ifelse(x <= z, 1, 0)
} else if(y <= z){
ifelse(x < y, 1, 0)
} else {
ifelse(x <= z, 1, 0)
}
}
I use it in the following manner
library(dplyr)
df[c(3, 5, 7), "event_date"] <- NA #there are some NA in .$event_date
df[c(3, 4, 6), "death_date"] <- NA #there are some NA in .$death_date
df %>%
mutate(event = sapply(.$event_date, eventR, y = .$death_date, z = .$study_over))
##Error: wrong result size (400), expected 20 or 1
##In addition: There were 40 warnings (use warnings() to see them)
I can't figure out how to do this. Any suggestions?
This would seem to construct a binary column (with NA's where needed) where 1 indicates "event_date is before death_date or study_over" and 0 is used elsewhere. As already pointed out your specification does not cover all cases:
df$event <- with(df, as.numeric( event_date < pmax( death_date , study_over) ) )
df