Search code examples
rcumsum

Resetting a cumsum to a value in a variable


I have a simple dataset, with date, sample, and application. Every day the value in the sample should decrease by 1 and increase by whatever is applied. When we take another sample we start over from the new value. The result is given in the variable estimate. I'm trying to calculate the variable estimate in R.

Here's my dataset:

structure(
    list(
        date = structure(c(17591, 17592, 17593, 17594, 
          17595, 17596, 17597, 17598, 17599, 17600, 17601, 17602, 17603, 
          17604, 17605, 17606, 17607, 17608, 17609, 17610, 17611, 17612, 
          17613, 17614), class = "Date"),
        sample = c(30, NA, NA, NA, NA, 20, NA, NA, 40, NA, NA, 
          NA, NA, NA, 35, NA, NA, 15, NA, NA, NA, 
          NA, 5, NA), 
        applied = c(NA, NA, 10, NA, NA, NA, NA, 10, NA, NA, 
          15, NA, NA, NA, 10, NA, NA, NA, 15, NA, NA, 10, NA, NA), 
        estimate = c(30, 29, 38, 37, 36, 20, 19, 28, 40, 39, 53, 52, 
          51, 50, 35, 34, 33, 15, 29, 28, 27, 36, 5, 4)
        ), 
    class = "data.frame", row.names = c(NA, -24L)
)

Solution

  • This could probably be cleaned up a little, but it works:

    library(dplyr)
    df %>% 
      mutate(group = cumsum(!is.na(sample))) %>%
      group_by(group) %>%
      mutate(
        to_add = ifelse(!is.na(sample), 0, ifelse(is.na(applied), -1, applied - 1)),
        result = first(sample) + cumsum(to_add)
      )
    # # A tibble: 24 x 7
    # # Groups:   group [6]
    #    date       sample applied estimate group to_add result
    #    <date>      <dbl>   <dbl>    <dbl> <int>  <dbl>  <dbl>
    #  1 2018-03-01     30      NA       30     1      0     30
    #  2 2018-03-02     NA      NA       29     1     -1     29
    #  3 2018-03-03     NA      10       38     1      9     38
    #  4 2018-03-04     NA      NA       37     1     -1     37
    #  5 2018-03-05     NA      NA       36     1     -1     36
    #  6 2018-03-06     20      NA       20     2      0     20
    #  7 2018-03-07     NA      NA       19     2     -1     19
    #  8 2018-03-08     NA      10       28     2      9     28
    #  9 2018-03-09     40      NA       40     3      0     40
    # 10 2018-03-10     NA      NA       39     3     -1     39
    # # ... with 14 more rows