Search code examples
rdplyrplyr

R: Define starting condition for continous value


I´m trying to set up two new variables to incorporate into an existing data.frame which should be a running value starting at 1 (0) if a condition is met with respect to the IDs in the data.frame. So the data.frame is of similar structure to this:

ID   Var1
1     0
1     2 
1     5
1     12
2     0
2     2 
2     NA
2     11

and I want to get to:

ID  Var1   start   stop
1    0       0      0
1    2       0      1
1    5       1      2
1    12      2      3
2    0       0      0
2    2       0      1
2    NA      1      2
2    11      2      3

Start should be a running value, starting once Var1 > 0 for the first time and stop should operate the same way. Start´s starting value should be 0 and stop´s starting value should be 1. It should further continue running, if Var1 takes on NA or 0 again in the course of the data.frame. I have tried doing the following:

df %>%
  group_by(ID) %>%
  mutate(stop = ifelse(Var1 > 0, 
  0:nrow(df), 0))

But the variable it returns doesn´t start with 0, but with the number of the row the condition is first met in.


Solution

  • Here is base R option using ave + replace

    transform(df,
      Start = ave(ave(replace(Var1, is.na(Var1), 0) > 0, ID, FUN = cumsum) > 0, ID, FUN = function(x) cumsum(c(0, x))[-(length(x) + 1)]),
      Stop = ave(ave(replace(Var1, is.na(Var1), 0) > 0, ID, FUN = cumsum) > 0, ID, FUN = cumsum)
    )
    

    or

    transform(df,
      Start = ave(ave(ave(replace(Var1, is.na(Var1), 0) > 0, ID, FUN = cumsum), ID, FUN = cumsum) > 1, ID, FUN = cumsum),
      Stop = ave(ave(replace(Var1, is.na(Var1), 0) > 0, ID, FUN = cumsum) > 0, ID, FUN = cumsum)
    )
    

    which gives

      ID Var1 Start Stop
    1  1    0     0    0
    2  1    2     0    1
    3  1    5     1    2
    4  1   12     2    3
    5  2    0     0    0
    6  2    2     0    1
    7  2   NA     1    2
    8  2   11     2    3