Search code examples
rdata.tablelagshift

Use previous calculated row value in r


I have a data.table that looks like this:

DT <- data.table(A=1:20, B=1:20*10, C=1:20*100)
DT
    A  B   C
1:  1  10  100
2:  2  20  200
3:  3  30  300
4:  4  40  400
5:  5  50  500
...
20: 20 200 2000

I want to be able to calculate a new column "D" that has the first value as the average of the first 20 rows in column B as the first value, and then I want to use the first row of column D to help calculate the next row value of D.

Say the Average of the first 20 rows of column B is 105. and the formula for the next row in column D is this : DT$D[1]+DT$C[2] where I take the previous row value of D and add the row value of C. The third row will then look like this: DT$D[2]+DT$C[3]

    A  B   C    D
1:  1   10   100     105
2:  2   20   200     305
3:  3   30   300     605
4:  4   40   400    1005
5:  5   50   500    1505
...
20: 20  200  2000  21005

Any ideas on this would be made?

I think shift would be a great help to lag, but dont know how to get rid of the NA that it produces at the first instance?


Solution

  • We can take the mean of the first 20 rows of column B and add the cumulative sum of C. The cumulative sum has one special consideration that we want to add a concatenation of 0 and column C without the first value.

    DT[, D := mean(B[1:20]) + cumsum(c(0, C[-1]))][]
    #      A   B    C     D
    #  1:  1  10  100   105
    #  2:  2  20  200   305
    #  3:  3  30  300   605
    #  4:  4  40  400  1005
    #  5:  5  50  500  1505
    #  6:  6  60  600  2105
    #  7:  7  70  700  2805
    #  8:  8  80  800  3605
    #  9:  9  90  900  4505
    # 10: 10 100 1000  5505
    # 11: 11 110 1100  6605
    # 12: 12 120 1200  7805
    # 13: 13 130 1300  9105
    # 14: 14 140 1400 10505
    # 15: 15 150 1500 12005
    # 16: 16 160 1600 13605
    # 17: 17 170 1700 15305
    # 18: 18 180 1800 17105
    # 19: 19 190 1900 19005
    # 20: 20 200 2000 21005