I have a primary dataset like this.
Id Date Col1 Col2 Col3
1211 01/14/2009 1 2 1
1211 03/19/2001 NA 0 2
1019 02/21/1999 2 1 NA
2023 03/09/2002 1 NA 0
1245 05/14/2022 0 1 NA
My second dataset is a lookup dataset with mode values of each colum
Col1 Col2 Col3
0 1 0
I like to impute the mode values from a second lookup dataset for missing values in 1st dataset.
Basically any missing values should be imputed with the mode value in second lookup dataset.
Expecting a final dataset like this.
Id Date Col1 Col2 Col3
1211 01/14/2009 1 2 1
1211 03/19/2001 0 0 2
1019 02/21/1999 2 1 0
2023 03/09/2002 1 1 0
1245 05/14/2022 0 1 0
Right now I am doing this with very clunky code which is very manual and takes a lot of lines. Any suggestions on how to efficiently impute based on lookup table values is much appreciated. Thanks.
library(dplyr)
library(tidyr)
df <- tibble::tribble(
~Col1, ~Col2, ~Col3,
1, 2, 1,
NA, 0, 2,
2, 1, NA,
1, NA, 0,
0, 1, NA
)
lkp <- data.frame(
Col1 = 0,
Col2 = 1,
Col3 = 0
)
df |>
mutate(across(starts_with("Col"), \(x) replace_na(x, lkp[[cur_column()]])))