Search code examples
rlagmissing-data

Using lag variable, by group


I have a df that looks like this and I need to run a code to produce change. change is defined as the first time to permanent positive outcome (outcome = 1).

The logic is as follows:

  • Each ID has 5 visits with the value of the outcome at each visit
  • The change variable can only be 1 if the outcome is 1 at visit x and thereafter
  • For example, id 2 cannot have change = 1 at time 2 because the outcome reverts back to a negative result at time 3.
  • An additional wrinkle includes the missing data. The potential outcome for id 3 at visit 2 could be 1 or 0. Since the value at this visit could be 1, then change should be 1.

My data with the desired output variable is

id visit outcome change
1   1     0       0
1   2     0       0
1   3     0       0 
1   4     1       1
1   5     1       0

2   1     0       0
2   2     1       0
2   3     0       0
2   4     1       1
2   5     1       0  

3   1     0       0
3   2     NA      1
3   3     1       1
3   4     1       0
3   5     1       0

Solution

  • You can do this easily with dplyr:

    library(dplyr)
    df <- data.frame(id = rep(c(1,2,3), each = 5), visit = rep(1:5, 3), 
           outcome = c(0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, NA, 1,1,1))
    df %>%
    group_by(id) %>%
    mutate(change = as.numeric(lead(outcome) == 1 & outcome == 1),
           change = ifelse(visit == 5, 0, change),
           change = ifelse(is.na(change), lead(change), change))