Search code examples
rdataframedplyrcumulative-sum

Cumulative sum per group in tidyverse R


I have paneldata, i.e. repeated observations per household. A unit (household) is measured through time and exhibits a characteric (e.g. variable). I can calculate a group sum per year with group_by(id, year). How can I have a cummulative sum over time as in the goal column? I need the result to preserve 10 rows in this example, i.e. not collapse data to the years. How can I pick just one group-sum per year per unit to sum up?

set.seed(1234)
data <- data.frame(id = rep(100, 10),
                   year = c(rep(2022, 5), rep(2023, 5)),
                   variable = rbinom(10, 1, 0.5))

library(tidyverse)

data <- data %>% 
  group_by(id, year) %>% 
  mutate(group_sum_per_year = sum(variable)) 

data$goal <- c(4,4,4,4,4,7,7,7,7,7)

data

# A tibble: 10 × 5
# Groups:   id, year [2]
      id  year variable group_sum_per_year  goal
   <dbl> <dbl>    <int>              <int> <dbl>
 1   100  2022        0                  4     4
 2   100  2022        1                  4     4
 3   100  2022        1                  4     4
 4   100  2022        1                  4     4
 5   100  2022        1                  4     4
 6   100  2023        1                  3     7
 7   100  2023        0                  3     7
 8   100  2023        0                  3     7
 9   100  2023        1                  3     7
10   100  2023        1                  3     7

Solution

  • You could first create a tempory column hlp which is equal to group_sum_per_year for only the first entry per group.

    Then you could group by id and use cumsum on hlp:

    data %>% 
      group_by(id, year) %>% 
      mutate(group_sum_per_year = sum(variable)) %>% 
      mutate(hlp = if_else(1:n() == 1, group_sum_per_year, 0)) %>%
      group_by(id) %>%
      mutate(goal = cumsum(hlp))
    
    # A tibble: 10 × 6
    # Groups:   id [1]
          id  year variable group_sum_per_year   hlp  goal
       <dbl> <dbl>    <int>              <int> <dbl> <dbl>
     1   100  2022        0                  4     4     4
     2   100  2022        1                  4     0     4
     3   100  2022        1                  4     0     4
     4   100  2022        1                  4     0     4
     5   100  2022        1                  4     0     4
     6   100  2023        1                  3     3     7
     7   100  2023        0                  3     0     7
     8   100  2023        0                  3     0     7
     9   100  2023        1                  3     0     7
    10   100  2023        1                  3     0     7