I first calculated monthly sum by counting IDs within a group using group-by and mutate. However, as I try to use cumsum to calculate monthly cumulative sum by group using cumsum, the data just does not look right. It would be really helpful if someone can help me. Thanks~
Assuming original data frame looks like so:
person id | company | yyyy | month |
---|---|---|---|
1 | A | 2011 | January |
2 | A | 2011 | January |
3 | A | 2011 | Feburary |
4 | A | 2011 | Feburary |
5 | A | 2011 | Feburary |
6 | A | 2011 | March |
7 | B | 2011 | January |
8 | B | 2011 | January |
9 | B | 2011 | January |
10 | B | 2011 | Feburary |
11 | B | 2011 | Feburary |
12 | B | 2011 | Feburary |
13 | B | 2011 | Feburary |
14 | B | 2011 | March |
15 | B | 2011 | March |
16 | B | 2011 | April |
I would like to have a final output data frame like the one below.
person id | company | yyyy | month | monthly sum | cumulative monthly sum |
---|---|---|---|---|---|
1 | A | 2011 | January | 2 | 2 |
2 | A | 2011 | January | 2 | 2 |
3 | A | 2011 | Feburary | 3 | 5 |
4 | A | 2011 | Feburary | 3 | 5 |
5 | A | 2011 | Feburary | 3 | 5 |
6 | A | 2011 | March | 1 | 6 |
7 | B | 2011 | January | 3 | 3 |
8 | B | 2011 | January | 3 | 3 |
9 | B | 2011 | January | 3 | 3 |
10 | B | 2011 | Feburary | 4 | 7 |
11 | B | 2011 | Feburary | 4 | 7 |
12 | B | 2011 | Feburary | 4 | 7 |
13 | B | 2011 | Feburary | 4 | 7 |
14 | B | 2011 | March | 2 | 9 |
15 | B | 2011 | March | 2 | 9 |
16 | B | 2011 | April | 1 | 10 |
Only summing the first group value
library(dplyr)
df %>%
mutate(monthlySum = n_distinct(person.id),
cumulativeSum = row_number() == 1, .by = c(company, yyyy, month)) %>%
mutate(cumulativeSum = cumsum(if_else(cumulativeSum, monthlySum, 0)),
.by =c(company, yyyy))
person.id company yyyy month monthlySum cumulativeSum
1 1 A 2011 January 2 2
2 2 A 2011 January 2 2
3 3 A 2011 Feburary 3 5
4 4 A 2011 Feburary 3 5
5 5 A 2011 Feburary 3 5
6 6 A 2011 March 1 6
7 7 B 2011 January 3 3
8 8 B 2011 January 3 3
9 9 B 2011 January 3 3
10 10 B 2011 Feburary 4 7
11 11 B 2011 Feburary 4 7
12 12 B 2011 Feburary 4 7
13 13 B 2011 Feburary 4 7
14 14 B 2011 March 2 9
15 15 B 2011 March 2 9
16 16 B 2011 April 1 10