Say I have the following dataframe, initialized as df:
ID date value
A 01/2012 1
A 03/2012 2
A 05/2012 4
A 07/2012 3
A 09/2012 7
A 11/2012 1
A 01/2013 2
A 03/2013 8
A 05/2013 13
A 07/2013 2
A 09/2013 5
A 11/2013 2
B 01/2012 3
B 03/2012 9
B 05/2012 1
B 07/2012 0
B 09/2012 12
B 11/2012 3
B 01/2013 1
B 03/2013 4
B 05/2013 3
B 07/2013 3
B 09/2013 1
B 11/2013 1
Where the date variable is of the form mm/yyyy. I am looking for a way to create an additional column in this data frame that gives cumulative sum of the value column grouped by ID and year. For example, the following output is what I want:
ID date value cumsum
A 01/2012 1 1
A 03/2012 2 3
A 05/2012 4 7
A 07/2012 3 10
A 09/2012 7 17
A 11/2012 1 18
A 01/2013 2 2
A 03/2013 8 10
A 05/2013 13 23
A 07/2013 2 25
A 09/2013 5 30
A 11/2013 2 32
B 01/2012 3 3
B 03/2012 9 12
B 05/2012 1 13
B 07/2012 0 13
B 09/2012 12 25
B 11/2012 3 28
B 01/2013 1 1
B 03/2013 4 5
B 05/2013 3 8
B 07/2013 3 11
B 09/2013 1 12
B 11/2013 1 13
As you can see, the sum resets for each year but also for each ID. Essentially I am unsure of how to create a cumulative sum that is grouped by 2 columns instead of just 1. Any help would be appreciated
For completion here is a base R option -
transform(df, cumsum = ave(value, ID, sub('.*/', '', date), FUN = cumsum))
# ID date value cumsum
#1 A 01/2012 1 1
#2 A 03/2012 2 3
#3 A 05/2012 4 7
#4 A 07/2012 3 10
#5 A 09/2012 7 17
#6 A 11/2012 1 18
#7 A 01/2013 2 2
#8 A 03/2013 8 10
#9 A 05/2013 13 23
#10 A 07/2013 2 25
#11 A 09/2013 5 30
#12 A 11/2013 2 32
#13 B 01/2012 3 3
#14 B 03/2012 9 12
#15 B 05/2012 1 13
#16 B 07/2012 0 13
#17 B 09/2012 12 25
#18 B 11/2012 3 28
#19 B 01/2013 1 1
#20 B 03/2013 4 5
#21 B 05/2013 3 8
#22 B 07/2013 3 11
#23 B 09/2013 1 12
#24 B 11/2013 1 13