I'm trying to use cumsum
and mutate
to create a column showing growth over time. I'm able to produce what I want with cumsum
on its own by naming the column in question -- but the ultimate goal is to use across
to apply cumsum
to an arbitrary number of columns. (Figured I should get it right on one column first ...)
Here's what I'm working with
dat <- data.frame(year=c("2008", "2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018", "2019", "2020", "2021", "2022", "2023"), count = c(1, 1, 1, 1, 1, 1, 2, 3, 3, 3, 0, 2, 2, 3))
I can create a cumulative column with
dat[,"CulmCount1"]<-cumsum(dat$count)
and thought that I could do the same with
dat <- dat %>% group_by(count) %>% mutate(CulmCount2 = cumsum(count))
but that stops the cumulative count after 2015 and doesn't make any sense at all when we get to 2023. (If it didn't work at all, I'd think I was grouping wrong, but I don't understand why it stops cumulating.)
You can just use cumsum
directly, no grouping needed:
dat <- dat |> mutate(ct = cumsum(count))
assertthat::are_equal(dat$CulmCount1, dat$ct) #TRUE
Note: The cumsum
values look kind of funny in your group_by
version if you just look at them in the order they appear in dat
. But remember that you're grouping by count
, so all the 2's go together, the 3's go together, etc.
Try
data |>
arrange(count) |>
group_by(count) |>
mutate(CulmCount2 = cumsum(count))
and it'll be a lot clearer to see what was going on with your code.