I have a data.table that looks like this:
DT <- data.table(A=1:20, B=1:20*10, C=1:20*100)
DT
A B C
1: 1 10 100
2: 2 20 200
3: 3 30 300
4: 4 40 400
5: 5 50 500
...
20: 20 200 2000
I want to be able to calculate a new column "D" that has the first value as the average of the first 20 rows in column B as the first value, and then I want to use the first row of column D to help calculate the next row value of D.
Say the Average of the first 20 rows of column B is 105. and the formula for the next row in column D is this : DT$D[1]+DT$C[2] where I take the previous row value of D and add the row value of C. The third row will then look like this: DT$D[2]+DT$C[3]
A B C D
1: 1 10 100 105
2: 2 20 200 305
3: 3 30 300 605
4: 4 40 400 1005
5: 5 50 500 1505
...
20: 20 200 2000 21005
Any ideas on this would be made?
I think shift would be a great help to lag, but dont know how to get rid of the NA that it produces at the first instance?
We can take the mean of the first 20 rows of column B and add the cumulative sum of C
. The cumulative sum has one special consideration that we want to add a concatenation of 0
and column C without the first value.
DT[, D := mean(B[1:20]) + cumsum(c(0, C[-1]))][]
# A B C D
# 1: 1 10 100 105
# 2: 2 20 200 305
# 3: 3 30 300 605
# 4: 4 40 400 1005
# 5: 5 50 500 1505
# 6: 6 60 600 2105
# 7: 7 70 700 2805
# 8: 8 80 800 3605
# 9: 9 90 900 4505
# 10: 10 100 1000 5505
# 11: 11 110 1100 6605
# 12: 12 120 1200 7805
# 13: 13 130 1300 9105
# 14: 14 140 1400 10505
# 15: 15 150 1500 12005
# 16: 16 160 1600 13605
# 17: 17 170 1700 15305
# 18: 18 180 1800 17105
# 19: 19 190 1900 19005
# 20: 20 200 2000 21005