Search code examples
rfunctionnarollapply

Use value from latest non missing row and apply rolling function


i have data like in

DT <- data.frame(id=rep("A",times=10),B=1:10, C=c(NA,2:5,NA,NA,NA,NA,NA))
CT <- DT[,c(2,3)]*3
CT$id <- rep("B",times=10)
DT <- rbind(DT,CT)

I would like to fill in NAs in column C with its previous value plus function of value of other column, column B (ignoring the first NA in col C), for example:

DT$C[6] =DT$C[5]+DT$B[6]*0.3
DT$C[7] =DT$C[6]+DT$B[7]*0.3
DT$C[8] =DT$C[7]+DT$B[8]*0.3

etc.

Also, i need to replicate it by the id value (column id in the table). Example pic with three manually computed values is below. Thanks for suggestions!

enter image description here


Solution

  • The formula for filling in NA cells is equivalent to filling them in using na.locf plus taking 0.3 times the cumsum of the B values corresponding to the NA values in C.

    So first create an expression for a grouping vector associating a unique number to each non-NA and each stretch of consecutive NAs.

    rleid(seq_along(C) * !is.na(C)))
    

    For each of those groups compute the vector of cumulative sums of B * is.na(C) which is 0 if the group has one non-NA element and is cumsum if the group consists of consecutive NAs.

    This gives the following single statement solution:

    library(data.table)
    library(zoo)
    
    transform(DT, C = ave(C, id, FUN = na.locf0) + 
          0.3 * ave(B * is.na(C), rleid(seq_along(C) * !is.na(C)), id, FUN = cumsum))
    

    giving:

       id  B    C
    1   A  1   NA
    2   A  2  2.0
    3   A  3  3.0
    4   A  4  4.0
    5   A  5  5.0
    6   A  6  6.8
    7   A  7  8.9
    8   A  8 11.3
    9   A  9 14.0
    10  A 10 17.0
    11  B  3   NA
    12  B  6  6.0
    13  B  9  9.0
    14  B 12 12.0
    15  B 15 15.0
    16  B 18 20.4
    17  B 21 26.7
    18  B 24 33.9
    19  B 27 42.0
    20  B 30 51.0