Search code examples
rdplyrcumsum

Dpylr solution for cumsum with a factor reset


I need a dpylr solution that creates a cumsum column.

# Input dataframe
df <- data.frame(OilChanged = c("No","No","Yes","No","No","No","No","No","No","No","No","Yes","No"),
Odometer = c(300,350,410,420,430,450,500,600,600,600,650,660,700))

# Create difference column - first row starting with zero
df <- df %>% dplyr::mutate(Odometer_delta = Odometer - lag(Odometer, default = Odometer[1]))

I'm trying to make a reset condition based on the factor column for a cumulative sum. The result needs to be exactly like this.

# Wanted result dataframe
df <- data.frame(OilChanged = c("No","No","Yes","No","No","No","No","No","No","No","No","Yes","No"),
                   Odometer = c(300,350,410,420,430,450,500,600,600,600,650,660,700),
                   Diff = c(0,50,60,10,10,20,50,100,0,0,50,10,40),
                   CumSum = c(0,50,110,10,20,40,90,190,190,190,240,250,40))

Solution

  • You can create a new group everytime OilChanged == 'Yes' and take cumsum of Diff value in each group.

    library(dplyr)
    
    df %>%
      group_by(grp = lag(cumsum(OilChanged == 'Yes'), default = 0)) %>%
      mutate(newcumsum = cumsum(Diff)) %>%
      ungroup %>%
      select(-grp)
    
    
    #  OilChanged Odometer  Diff CumSum newcumsum
    #   <chr>         <dbl> <dbl>  <dbl>     <dbl>
    # 1 No              300     0      0         0
    # 2 No              350    50     50        50
    # 3 Yes             410    60    110       110
    # 4 No              420    10     10        10
    # 5 No              430    10     20        20
    # 6 No              450    20     40        40
    # 7 No              500    50     90        90
    # 8 No              600   100    190       190
    # 9 No              600     0    190       190
    #10 No              600     0    190       190
    #11 No              650    50    240       240
    #12 Yes             660    10    250       250
    #13 No              700    40     40        40