I have a data.table with ID, dates and values like the following one:
DT <- setDT(data.frame(ContractID= c(1,1,1,2,2), Date = c("2018-02-01", "2018-02-20", "2018-03-12", "2018-02-01", "2018-02-12"), Value = c(10,20,30,10,20)))
ContractID Date Value
1: 1 2018-02-01 10
2: 1 2018-02-20 20
3: 1 2018-03-12 30
4: 2 2018-02-01 10
5: 2 2018-02-12 20
I'd like to get a new column with the total cumulative sum per ID from a month ago until the current day for each row, like in the table below. NB: the third row is the sum of the second and the own third, because 2018-03-12 minus 1 month is greater than 2018-02-01, so we exclude the first row in the cum sum.
ContractID Date Value Cum_Sum_1M
1: 1 2018-02-01 10 10
2: 1 2018-02-20 20 30
3: 1 2018-03-12 30 50
4: 2 2018-02-01 10 10
5: 2 2018-02-12 20 30
Is there any way to achieve this using data.table?
Thank you!
Using tidyverse
and lubridate
, we first convert Date
to actual Date
object using as.Date
, then group_by
ContractID
and for each Date
sum
the Value
which is between current Date
and one month before the current Date
.
library(tidyverse)
library(lubridate)
DT %>%
mutate(Date = as.Date(Date)) %>%
group_by(ContractID) %>%
mutate(Cum_Sum_1M = map_dbl(1:n(), ~ sum(Value[(Date >= (Date[.] - months(1))) &
(Date <= Date[.])], na.rm = TRUE)))
# A tibble: 5 x 4
# Groups: ContractID [2]
# ContractID Date Value Cum_Sum_1M
# <dbl> <date> <dbl> <dbl>
#1 1 2018-02-01 10 10
#2 1 2018-02-20 20 30
#3 1 2018-03-12 30 50
#4 2 2018-02-01 10 10
#5 2 2018-02-12 20 30