I have data that looks like the following:
df <- tibble(
date= seq.Date(as.Date("2021-01-01"), as.Date("2022-02-01"), by = "month"),
val1 = c(105, 105, 105, 125, 125, 125, 125, 132, 132, 132, 135, 150, 150, 150),
val2 = c(100, 100, 100, 125, 125, 125, 125, 125, 125, 125, 125, 150, 150, 150),
diff = val1-val2)
I am trying to produce the following:
output <- tibble(
date= seq.Date(as.Date("2021-01-01"), as.Date("2022-02-01"), by = "month"),
val1 = c(105, 105, 105, 125, 125, 125, 125, 132, 132, 132, 135, 150, 150, 150),
val2 = c(100, 100, 100, 125, 125, 125, 125, 125, 125, 125, 125, 150, 150, 150),
diff = val1-val2,
diff_calc = c(0, 0, 0, 5, 5, 5, 5, 5, 5, 5, 5, 22, 22, 22))
Where diff_calc
is the cumulative sum of the previous unique values in diff
, where the summing occurs starting where diff
= 0 for the previous diff
value, and is replicated until diff
is 0, and the previous unique diff
values are again cumulatively summed.
I have tried different combinations of lags and joins, but am really struggling here. Thanks!
This is very similar to @jblood94's answer, but might be a little simpler
df |>
mutate(
diff_calc = cummax(cumsum(diff * !duplicated(diff)) * (diff == 0)),
)