Search code examples
rcumsum

calculating cumulatives within a group correctly


I hope anyone can help with this. I have a data frame similar to this:

test <- data.frame(ID = c(1:24),
                  group = rep(c(1,1,1,1,1,1,2,2,2,2,2,2),2),
                  year1 = rep(c(2018,2018,2018,2019,2019,2019),4),
                  month1 = rep(c(1,2,3),8))

Now I want to do a cumsum per group but when I use the following code the sumsum 'restarts' each year.

test2 <-test %>% 
  group_by(group,year1,month1) %>% 
  summarise(a = length(unique(ID)))  %>%
  mutate(a = cumsum(a))

My desired output is:

   group year1 month1  a
1      1  2018      1  2
2      1  2018      2  4
3      1  2018      3  6
4      1  2019      1  8
5      1  2019      2 10
6      1  2019      3 12
7      2  2018      1  2
8      2  2018      2  4
9      2  2018      3  6
10     2  2019      1  8
11     2  2019      2 10
12     2  2019      3 12

Solution

  • You could first count unique ID for each group, month and year and then take cumsum of it for each group.

    library(dplyr)
    
    test %>%
      group_by(group, year1, month1) %>%
      summarise(a = n_distinct(ID)) %>%
      group_by(group) %>%
      mutate(a = cumsum(a)) 
    
    #   group year1 month1     a
    #   <dbl> <dbl>  <dbl> <int>
    # 1     1  2018      1     2
    # 2     1  2018      2     4
    # 3     1  2018      3     6
    # 4     1  2019      1     8
    # 5     1  2019      2    10
    # 6     1  2019      3    12
    # 7     2  2018      1     2
    # 8     2  2018      2     4
    # 9     2  2018      3     6
    #10     2  2019      1     8
    #11     2  2019      2    10
    #12     2  2019      3    12