I have this data frame:
df=data.frame(id=c(1,1,2,2,2,5,NA),var=c("a","a","b","b","b","e","f"),value=c(1,1,0,1,0,0,1),cs=c(2,2,3,3,3,3,NA))
I want to calculate the sum of value for each group (id, var) and then the cumulative sum but I would like to have the cumulative sum to be displayed for each row of data, i.e., I don't want to summarized view of the data. I have included what my output should look like. This is what I have tried so far:
df%>%arrange(id,var)%>%group_by(id,var)%>%mutate(cs=cumsum(value))
Any suggestions?
Here is an approach that I think meets your expectations.
Would group by id
and calculate the sum of value
for each id
via summarise
.
You can then add your cumulative sum column with mutate
. Based on your comments, I included an ifelse
so that if id
was NA
, it would not provide a cumulative sum, but instead be given NA
.
Finally, to combine your cumulative sum data with your original dataset, you would need to join the two tables.
library(tidyverse)
df %>%
arrange(id) %>%
group_by(id) %>%
summarise(sum = sum(value)) %>%
mutate(cs=ifelse(is.na(id), NA, cumsum(sum))) %>%
left_join(df)
Output
# A tibble: 7 x 5
id sum cs var value
<dbl> <dbl> <dbl> <fct> <dbl>
1 1 2 2 a 1
2 1 2 2 a 1
3 2 1 3 b 0
4 2 1 3 b 1
5 2 1 3 b 0
6 5 0 3 e 0
7 NA 1 NA f 1