Search code examples
rcumsummutate

Using cumsum with mutate


I'm trying to use cumsum and mutate to create a column showing growth over time. I'm able to produce what I want with cumsum on its own by naming the column in question -- but the ultimate goal is to use across to apply cumsum to an arbitrary number of columns. (Figured I should get it right on one column first ...)

Here's what I'm working with

dat <- data.frame(year=c("2008", "2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018", "2019", "2020", "2021", "2022", "2023"), count = c(1, 1, 1, 1, 1, 1, 2, 3, 3, 3, 0, 2, 2, 3))

I can create a cumulative column with

dat[,"CulmCount1"]<-cumsum(dat$count)

and thought that I could do the same with

dat <- dat %>% group_by(count) %>% mutate(CulmCount2 = cumsum(count))

but that stops the cumulative count after 2015 and doesn't make any sense at all when we get to 2023. (If it didn't work at all, I'd think I was grouping wrong, but I don't understand why it stops cumulating.)


Solution

  • You can just use cumsum directly, no grouping needed:

    dat <- dat |> mutate(ct = cumsum(count))
    
    assertthat::are_equal(dat$CulmCount1, dat$ct) #TRUE
    

    Note: The cumsum values look kind of funny in your group_by version if you just look at them in the order they appear in dat. But remember that you're grouping by count, so all the 2's go together, the 3's go together, etc.

    Try

    data |> 
      arrange(count) |> 
      group_by(count) |> 
      mutate(CulmCount2 = cumsum(count))
    

    and it'll be a lot clearer to see what was going on with your code.