Data Cleaning for Survival Analysis

I’m in the process of cleaning some data for a survival analysis and I am trying to make it so that an individual only has a single, sustained, transition from symptom present (ss=1) to symptom remitted (ss=0). An individual must have a complete sustained remission in order for it to count as a remission. Statistical problems/issues aside, I’m wondering how I can go about addressing the issues detailed below.

I’ve been trying to break the problem apart into smaller, more manageable operations and objects, however, the solutions I keep coming to force me to use conditional formatting based on rows immediately above and below the a missing value and, quite frankly, I’m at a bit of a loss as to how to do this. I would love a little guidance if you think you know of a good technique I can use, experiment with, or if you know of any good search terms I can use when looking up a solution.

The details are below:

#Fake dataset creation
id <- c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,4,4,4,4)
time <-c(0,1,2,3,4,5,6,0,1,2,3,4,5,6,0,1,2,3,4,5,6,0,1,2,3,4,5,6)
ss <- c(1,1,1,1,NA,0,0,1,1,0,NA,0,0,0,1,1,1,1,1,1,NA,1,1,0,NA,NA,0,0)
mydat <- data.frame(id, time, ss)

*Bold and underlined characters represent changes from the dataset above

The goal here is to find a way to get the NA values for ID #1 (variable ss) to look like this: 1,1,1,1,1,0,0

ID# 2 (variable ss) to look like this: 1,1,0,0,0,0,0

ID #3 (variable ss) to look like this: 1,1,1,1,1,1,NA (no change because the row with NA will be deleted eventually)

ID #4 (variable ss) to look like this: 1,1,1,1,1,0,0 (this one requires multiple changes and I expect it is the most challenging to tackle).

Solution

I don't really think you have considered all the "edge case". What to do with two NA's in a row at the end of a period or 4 or 5 NA's in a row. This will give you the requested solution in your tiny test case, however, using the na.locf-function:

require(zoo)
fillNA <- function(vec) { if ( is.na(tail(vec, 1)) ){ vec } else { vec <- na.locf(vec) }
                         }

> mydat$locf <- with(mydat, ave(ss, id, FUN=fillNA))
> mydat
   id time ss locf
1   1    0  1    1
2   1    1  1    1
3   1    2  1    1
4   1    3  1    1
5   1    4 NA    1
6   1    5  0    0
7   1    6  0    0
8   2    0  1    1
9   2    1  1    1
10  2    2  0    0
11  2    3 NA    0
12  2    4  0    0
13  2    5  0    0
14  2    6  0    0
15  3    0  1    1
16  3    1  1    1
17  3    2  1    1
18  3    3  1    1
19  3    4  1    1
20  3    5  1    1
21  3    6 NA   NA
22  4    0  1    1
23  4    1  1    1
24  4    2  0    0
25  4    3 NA    0
26  4    4 NA    0
27  4    5  0    0
28  4    6  0    0