I have an example data.table and I am trying to code 3 different cumulative sums by ID, one for each "Type". My data table is also sorted by date.
DT<-data.table(ID=c(A,A,A,A,B,B,B,C,C,C,C),date=c("2008-01-01","2009-01-01","2010-01-01","2011-01-01","1978-01-01","1982-01-01","1985-01-01","2001-01-01","2011-01-01","2015-01-01","2019-01-01"),Type=c(1,2,2,3,3,1,1,2,2,1,3),Days=c(18,333,26,57,48,10,17,,212,55,64,18))
I tried:
DT[Type==1&order(date),cumdays1:=cumsum(Days),by=ID]
DT[Type==2&order(date),cumdays2:=cumsum(Days),by=ID]
DT[Type==3&order(date),cumdays3:=cumsum(Days),by=ID]
However, this gives me NA for when the Type is different to the one I was calculating.
ID date Type Days cumdays1 cumdays2 cumdays3
1: A 2008-01-01 1 18 18 NA NA
2: A 2009-01-01 2 333 NA 333 NA
3: A 2010-01-01 2 26 NA 359 NA
4: A 2011-01-01 3 57 NA NA 57
5: B 1978-01-01 3 48 NA NA 48
6: B 1982-01-01 1 10 10 NA NA
7: B 1985-01-01 1 17 27 NA NA
8: C 2001-01-01 2 212 NA 212 NA
9: C 2011-01-01 2 55 NA 267 NA
10: C 2015-01-01 1 64 64 NA NA
11: C 2019-01-01 3 18 NA NA 18
I would like instead to continue calculate cumulative sum using Day=0 for these columns. I know I could code a separate Day variable for each Type, but is there a simpler code to do this at once?
My desired output
ID date Type Days cumdays1 cumdays2 cumdays3
1: A 2008-01-01 1 18 18 0 0
2: A 2009-01-01 2 333 18 333 0
3: A 2010-01-01 2 26 18 359 0
4: A 2011-01-01 3 57 18 359 57
5: B 1978-01-01 3 48 0 0 48
6: B 1982-01-01 1 10 10 0 48
7: B 1985-01-01 1 17 27 0 48
8: C 2001-01-01 2 212 0 212 0
9: C 2011-01-01 2 55 0 267 0
10: C 2015-01-01 1 64 64 267 0
11: C 2019-01-01 3 18 64 267 18
Using Base R:
transform(dt[order(dt$ID,dt$date),],
cumdays = apply(Days * diag(max(Type))[Type,],
2, \(x)ave(x,ID, FUN = cumsum)))
ID date Type Days cumdays.V1 cumdays.V2 cumdays.V3
1: A 2008-01-01 1 18 18 0 0
2: A 2009-01-01 2 333 18 333 0
3: A 2010-01-01 2 26 18 359 0
4: A 2011-01-01 3 57 18 359 57
5: B 1978-01-01 3 48 0 0 48
6: B 1982-01-01 1 10 10 0 48
7: B 1985-01-01 1 17 27 0 48
8: C 2001-01-01 2 212 0 212 0
9: C 2011-01-01 2 55 0 267 0
10: C 2015-01-01 1 64 64 267 0
11: C 2019-01-01 3 18 64 267 18
Tidyverse:
dt %>%
arrange(ID, date) %>%
mutate(name = Type) %>%
pivot_wider(id_cols = c(ID, date, Type),
names_prefix = 'cumdays',
values_from = Days, values_fill = 0) %>%
mutate(across(starts_with('cumdays'), cumsum), .by = ID)
# A tibble: 11 × 6
ID date Type cumdays1 cumdays2 cumdays3
<chr> <chr> <int> <int> <int> <int>
1 A 2008-01-01 1 18 0 0
2 A 2009-01-01 2 18 333 0
3 A 2010-01-01 2 18 359 0
4 A 2011-01-01 3 18 359 57
5 B 1978-01-01 3 0 0 48
6 B 1982-01-01 1 10 0 48
7 B 1985-01-01 1 27 0 48
8 C 2001-01-01 2 0 212 0
9 C 2011-01-01 2 0 267 0
10 C 2015-01-01 1 64 267 0
11 C 2019-01-01 3 64 267 18