i have data like in
DT <- data.frame(id=rep("A",times=10),B=1:10, C=c(NA,2:5,NA,NA,NA,NA,NA))
CT <- DT[,c(2,3)]*3
CT$id <- rep("B",times=10)
DT <- rbind(DT,CT)
I would like to fill in NAs in column C with its previous value plus function of value of other column, column B (ignoring the first NA in col C), for example:
DT$C[6] =DT$C[5]+DT$B[6]*0.3
DT$C[7] =DT$C[6]+DT$B[7]*0.3
DT$C[8] =DT$C[7]+DT$B[8]*0.3
etc.
Also, i need to replicate it by the id value (column id in the table). Example pic with three manually computed values is below. Thanks for suggestions!
The formula for filling in NA cells is equivalent to filling them in using na.locf
plus taking 0.3 times the cumsum of the B
values corresponding to the NA values in C
.
So first create an expression for a grouping vector associating a unique number to each non-NA and each stretch of consecutive NAs.
rleid(seq_along(C) * !is.na(C)))
For each of those groups compute the vector of cumulative sums of B * is.na(C)
which is 0 if the group has one non-NA element and is cumsum
if the group consists of consecutive NAs.
This gives the following single statement solution:
library(data.table)
library(zoo)
transform(DT, C = ave(C, id, FUN = na.locf0) +
0.3 * ave(B * is.na(C), rleid(seq_along(C) * !is.na(C)), id, FUN = cumsum))
giving:
id B C
1 A 1 NA
2 A 2 2.0
3 A 3 3.0
4 A 4 4.0
5 A 5 5.0
6 A 6 6.8
7 A 7 8.9
8 A 8 11.3
9 A 9 14.0
10 A 10 17.0
11 B 3 NA
12 B 6 6.0
13 B 9 9.0
14 B 12 12.0
15 B 15 15.0
16 B 18 20.4
17 B 21 26.7
18 B 24 33.9
19 B 27 42.0
20 B 30 51.0