Suppose that I have a data frame as follows:
In dat
, the idp
is an identification of a person. Each value of a1
, ..., a4
represents a status, where 4
indicates "dying"/death.
For any idp
, if a 4
occurs all following values need to be set to 4
. If NA
occurs, we assume that the immediate previous state, which is not NA
, should replace the NA
. Finally, if the sequence starts with NA
, then we should choose the immediate non-missing first state appearing in the vector.
Split based on idp, fill NAs, then find 4 and fill with 4 if any:
#split per idp and loop
l <- lapply(split(dat$outcome, dat$idp), function(i){
# fill NA
out <- zoo::na.locf(zoo::na.locf(i, na.rm = FALSE),
na.rm = FALSE, fromLast = TRUE)
# fill 4 if any
ix4 <- min(which(out == 4))
if(length(ix4) > 0){ out[ ix4:length(out) ] <- 4 }
# $A
# [1] 1 1 1 2 3 4 4 4 4 4
# $B
# [1] 3 3 3 3 4 4 4 4 4 4
# $C
# [1] 1 1 1 1 2 2 4 4 4 4
# $D
# [1] 4 4 4 4 4 4 4 4 4 4
Convert back to dataframe
head(cbind(dat, outcomeNew = unlist(l, use.names = FALSE)), 10)
# convert back to dataframe
# idp outcome outcomeNew
# 1 A 1 1
# 2 A 1 1
# 3 A 1 1
# 4 A 2 2
# 5 A 3 3
# 6 A 4 4
# 7 A 3 4
# 8 A 4 4
# 9 A 2 4
# 10 A 2 4