Search code examples
rplyrdplyrapplyna

Fill NA values with the trailing row value times a growth rate?


What would be a good way to populate NA values with the previous value times (1 + growth)?

df <- data.frame(
  year = 0:6,
  price1 = c(1.1, 2.1, 3.2, 4.8, NA, NA, NA),
  price2 = c(1.1, 2.1, 3.2, NA, NA, NA, NA)
)
growth <- .02

In this case, I would want the missing values in price1 to be filled with 4.8*1.02, 4.8*1.02^2, and 4.8*1.02^3. Similarly, I would want the missing values in price2 to be filled with 3.2*1.02, 3.2*1.02^2, 3.2*1.02^3, and 3.2*1.02^4.

I've tried this, but I think it needs to be set to repeat somehow (apply?):

library(dplyr)
df %>%
  mutate(price1 = ifelse(is.na(price1),
    lag(price1) * (1 + growth), price1
  ))

I'm not using dplyr for anything else (yet), so something from base R or plyr or similar would be appreciated.


Solution

  • It looks like dplyr can't handle access newly assigned lag values. Here is a solution that should work even if the NA's are in the middle of a column.

    df <- apply(
      df, 2, function(x){
        if(sum(is.na(x)) == 0){return(x)}
        ## updated with optimized portion from @josilber
        r <- rle(is.na(x))
        na.loc <- which(r$values)
        b <- rep(cumsum(r$lengths)[na.loc-1], r$lengths[na.loc])
        lastValIs <- 1:length(x)
        lastValI[is.na(x)] <- b
        x[is.na(x)] <-
          sapply(which(is.na(x)), function(i){
            return(x[lastValIs[i]]*(1 + growth)^(i - lastValIs[i]))
          })
        return(x)
      })