Search code examples
rdatecumsum

R Complex Date Related and Cumulative Value Summation in R


I have data that looks like the following:

df <- tibble(
  date= seq.Date(as.Date("2021-01-01"), as.Date("2022-02-01"), by = "month"),
  val1 = c(105, 105, 105, 125, 125, 125, 125, 132, 132, 132, 135, 150, 150, 150),
  val2 = c(100, 100, 100, 125, 125, 125, 125, 125, 125, 125, 125, 150, 150, 150),
  diff = val1-val2)

I am trying to produce the following:

output <- tibble(
  date= seq.Date(as.Date("2021-01-01"), as.Date("2022-02-01"), by = "month"),
  val1 = c(105, 105, 105, 125, 125, 125, 125, 132, 132, 132, 135, 150, 150, 150),
  val2 = c(100, 100, 100, 125, 125, 125, 125, 125, 125, 125, 125, 150, 150, 150),
  diff = val1-val2,
  diff_calc = c(0, 0, 0, 5, 5, 5, 5, 5, 5, 5, 5, 22, 22, 22))

Where diff_calc is the cumulative sum of the previous unique values in diff, where the summing occurs starting where diff = 0 for the previous diff value, and is replicated until diff is 0, and the previous unique diff values are again cumulatively summed.

I have tried different combinations of lags and joins, but am really struggling here. Thanks!


Solution

  • This is very similar to @jblood94's answer, but might be a little simpler

    df |> 
      mutate(
        diff_calc = cummax(cumsum(diff * !duplicated(diff)) * (diff == 0)),
      )