Search code examples
rdata.tablepanel-datadata-management

fill datatable column iteratively


Using initial values, I want to fill iteratively NAs in a data.table column based on growth rates stored in a separate column, by id.

Take the following data.table as an example:

library(data.table)
DT <- data.table(id = c("A","A","A","A","B","B","B","B"),  date=1:4,
 growth=1L+runif(8), index= c(NA,250,NA,NA,NA,300,NA,NA))

> DT
   id date   growth index
1:  A    1 1.654628    NA
2:  A    2 1.770219   250
3:  A    3 1.255893    NA
4:  A    4 1.185985    NA
5:  B    1 1.826187    NA
6:  B    2 1.055251   300
7:  B    3 1.180389    NA
8:  B    4 1.204108    NA

Basically, what I need id for index values after date 2:

index_{i,t} = growth_{i,t}*index_{i,t-1}

And, for values before date 2:

index_{i,t} = index_{i,t-1}/growth_{i,t-1}

I had a go using shift, but this replace just index at t+1:

DT[, index := growth * shift(index,1L, type="lag")]  

UPDATE The desired result looks like that

> DT
   id date   growth    index
1:  A    1 1.440548 141.2255
2:  A    2 1.395092 250.0000
3:  A    3 1.793094 313.9733
4:  A    4 1.784224 372.3676
5:  B    1 1.129264 284.2926
6:  B    2 1.978359 300.0000
7:  B    3 1.228979 354.1167
8:  B    4 1.453433 426.3948

Solution

  • First, we'll define a function which takes two vectors, values and growths, that

    1. Finds the first non-NA value in values
    2. Determines the ratio of each element in values to the non-NA one by multiplying all the growths between it and the non-NA.
    3. Does that multiplication

    Note that this won't catch situations where there's more than one non-NA value, and it will error if values only has NAs. But I leave exception-handling to you, because you'll know best what to do.

    apply_growth <- function(values, growths) {
      given <- which(!is.na(values))[1]
    
      cumulative_growth <- vapply(
        X = seq_along(growths),
        FUN.VALUE = numeric(1),
        FUN = function(x) {
          if (x < given) {
            1 / prod(growths[seq(x + 1, given)])
          } else if (x > given) {
            prod(growths[seq(given + 1, x)])
          } else if (x == given) {
            1
          }
        }
      )
    
      values[given] * cumulative_growth
    }
    

    Now we'll apply it to each subgroup of DT. Just to be sure, we'll specify the rows have to be ordered by date.

    DT[
      order(date),
      index := apply_growth(index, growth),
      by = id
    ]
    
    DT
    #    id date   growth    index
    # 1:  A    1 1.993863 180.7514
    # 2:  A    2 1.383115 250.0000
    # 3:  A    3 1.350102 337.5256
    # 4:  A    4 1.863802 629.0809
    # 5:  B    1 1.664999 249.2398
    # 6:  B    2 1.203660 300.0000
    # 7:  B    3 1.595310 478.5931
    # 8:  B    4 1.002311 479.6989