I have a dataframe (or datatable, if that's easier) with incomplete rows:
ID Var1 Var2 Var3
1 2 5 1
2 12 3
3 8
4 4
d <- data.frame(
ID = 1:4,
Var1 = c(2, 12, 8, 4),
Var2 = c(5, 3, NA, NA),
Var3 = c(1, NA, NA, NA)
)
library(data.table)
d <- fread("
ID Var1 Var2 Var3
1 2 5 1
2 12 3 NA
3 8 NA NA
4 4 NA NA
")
The empty cells are always at the end of a row.
I would like to fill the empty cells in each row with the value in the last non-empty cell in that row, e.g.:
ID Var1 Var2 Var3
1 2 5 1
2 12 3 -> 3
3 8 -> 8 -> 8
4 4 -> 4 -> 4
How do I do that?
I don't want to use dplyr and I don't want to fill columns.
Another answer using the collapse
package, which has the advantage of being particularly fast(er than data.table
):
library(collapse)
dapply(d, na_locf, MARGIN = 1)
# ID Var1 Var2 Var3
# 1 1 2 5 1
# 2 2 12 3 3
# 3 3 8 8 8
# 4 4 4 4 4
Microbenchmark:
# Unit: microseconds
# expr min lq mean median uq max neval
# collapse 69.5 112.95 244.847 135.45 161.7 9964.4 100
# dt 592.9 788.70 1237.643 874.70 1186.6 14563.1 100
# tidyr 32283.2 36170.80 41293.420 40501.55 43809.1 75417.8 100
# Reduce_dt 645.0 803.70 1083.373 954.05 1222.6 2367.5 100
# Reduce_TiC 383.9 499.25 661.475 586.40 687.6 5179.1 100
Code for benchmark:
microbenchmark::microbenchmark(
collapse = dapply(d, na_locf, MARGIN = 1),
dt = data.table::transpose(
setnafill(data.table::transpose(d, keep.names = "ID"),
type = "locf", cols = 1:nrow(d) + 1), make.names = "ID"),
tidyr = d %>%
pivot_longer(-ID) %>%
fill(value, .direction = "down") %>%
pivot_wider(),
Reduce_dt = d[, (sprintf("Var%d", 1:3)) := Reduce(\(x, y) ifelse(is.na(y), x, y), .SD, accumulate = TRUE), .SDcols = sprintf("Var%d", 1:3)],
Reduce_TiC = Reduce(\(x, y) ifelse(is.na(y), x, y), d[-1], accumulate = TRUE)
)