I have a dataset where I have a binary indicator of event occurences. From this list, I would like to create a count of the number of consecutive timesteps without an event occuring. As an example (TS = Timestep, EV = Event Indicator, C = Count):
TS1 -> TS2 -> TS3 -> TS4 -> TS5 ->...
EV0 -> EV0 -> EV1 -> EV0 -> EV0 ->...
C0 -> C1 -> C0 -> C0 -> C1 ->...
As an example dataframe, please consider:
labs <- c("A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C", "C", "D", "D", "D", "D", "D")
time <- c(1,2,3,4 ,1,2,3,4 ,1,2,3,4 ,1,2,3,4,5)
event <- c(0,0,0,0, 0,1,0,0, 1,1,0,0, NA,0,0,1,0)
desiredOutcome <- c(0,1,2,3,0,0,0,1,0,0,0,1,NA,0,1,0,0) # goal
exDF <- data.frame(labs,time, event, desiredOutcome)
From that end goal and dataframe, I ended up with the following code:
library(dplyr)
exDF <- exDF %>%
group_by(labs) %>%
mutate(pe1 = lag(event, order_by=time)) # create new variable for prior event
exDF$count2 <- ifelse(
((exDF$pe1 == 1) & (exDF$event == 0)), # condition checks for rows where previous timestep is included & had event WHERE event is not ongoing in this timestep
0, # True val
NA) # False val
exDF$count <- ifelse(
(is.na(exDF$pe1) & (exDF$event == 0)), # condition checks for rows where previous timestep is not included & no current event
0, # True val
exDF$count2) # False val
It seems to fill in all of the zeros correctly. But, I am unaware of a good way to get from the appropriate 0s filled and the other values with NAs to my desired outcome.
Most of my experimentation related to combining mutate and lag, but they lead to just the next set of values filled in (if the zeros were in the input column, the ones alone show up; if ones, then twos). The following example doesn't attempt to handle the reset of the count, but leads to the behavior described above:
exDF <- exDF %>%
group_by(labs) %>%
mutate(countFinal = lag(count, order_by=time) + 1)
So, my challenge relates to the order in which things are resolved. With something like the mutate command here, the order seems to be:
Pull all cell values by label -> Look at their lags -> Add 1 -> Done, but incorrectly
When I need it to be something like:
Pull first cell value by label -> Look at lag -> Add 1 or reset -> Pull second cell (filled in prior step) value by label -> Look at their lags -> Add 1 or reset -> Pull third... -> Done
Is there a good way to do this with an existing package?
Could not think of a more direct approach, but this works. The workflow:
replace()
first value in each event grouping with zero and change remaining non-zero group id values to 1library(dplyr)
exDF |>
group_by(labs) |>
mutate(tmp = if_else(is.na(event), 2, event),
tmp = cumsum(tmp != lag(tmp, default = 1))) |>
group_by(labs, tmp) |>
mutate(tmp = replace(tmp, 1, 0),
tmp = if_else(tmp != 0, 1, 0),
tmp = cumsum(tmp),
desiredOutcome = if_else(is.na(event), NA, desiredOutcome)) |>
ungroup() |>
select(-tmp)
# # A tibble: 17 × 4
# labs time event desiredOutcome
# <chr> <dbl> <dbl> <dbl>
# 1 A 1 0 0
# 2 A 2 0 1
# 3 A 3 0 2
# 4 A 4 0 3
# 5 B 1 0 0
# 6 B 2 1 0
# 7 B 3 0 0
# 8 B 4 0 1
# 9 C 1 1 0
# 10 C 2 1 0
# 11 C 3 0 0
# 12 C 4 0 1
# 13 D 1 NA NA
# 14 D 2 0 0
# 15 D 3 0 1
# 16 D 4 1 0
# 17 D 5 0 0