Assume we have a data.table
like:
library(data.table)
set.seed(123666)
dt <- data.table(
id = seq(1, 5),
sample1 = c(sample(c(NA, runif(2))), NA),
sample2 = c(NA, sample(c(NA, runif(3)))),
sample3 = c(sample(c(NA, runif(4))))
)
dt
id sample1 sample2 sample3
1: 1 NA NA 0.6387276
2: 2 0.9293370 0.1875354 0.2087892
3: 3 0.1528115 NA 0.7849779
4: 4 NA 0.6875024 0.3684756
5: 5 NA 0.4859773 NA
Its have many NA values, now, we want to fill it, typically, we can use following syntax
to do
dt[is.na(dt)] <- 0
dt
id sample1 sample2 sample3
1: 1 0.0000000 0.0000000 0.6387276
2: 2 0.9293370 0.1875354 0.2087892
3: 3 0.1528115 0.0000000 0.7849779
4: 4 0.0000000 0.6875024 0.3684756
5: 5 0.0000000 0.4859773 0.0000000
However, if we want to fill NA
with more complex rule, a custom function for example, calc_data()
, to calc NA
. This function need two input, just a example here, first is the id
value of NA value, secound is the colname or colname index of the cell.
# example, not real function
sample_value <- c(1, 3, 3)
names(sample_value) <- c('sample1', 'sample2', 'sample3')
calc_data <- function(sample, id) {
na_calc <- id * 3 + sample_value[sample]
}
Now, it is possible to fill NA
with this coustom function with data.table syntax
. how to put its required value to calc_data
perhaps something like this could work
for(j in 2:4) set(dt,
i = which(is.na(dt[[j]])),
j = j,
value = calc_data(j - 1, dt[which(is.na(dt[[j]])), "id"]))
output
id sample1 sample2 sample3
1: 1 0.8055845 6.00000000 0.2030456
2: 2 0.5705721 9.00000000 0.7954992
3: 3 10.0000000 0.09605308 12.0000000
4: 4 13.0000000 0.25545666 0.6506906
5: 5 0.8055845 0.51889032 0.8931946