Search code examples
rmissing-datalookup-tables

R impute from a look up table


I have a primary dataset like this.

 Id   Date         Col1    Col2    Col3
 1211 01/14/2009   1       2       1  
 1211 03/19/2001   NA      0       2
 1019 02/21/1999   2       1       NA
 2023 03/09/2002   1       NA      0
 1245 05/14/2022   0       1       NA

My second dataset is a lookup dataset with mode values of each colum

   Col1     Col2     Col3
   0        1        0

I like to impute the mode values from a second lookup dataset for missing values in 1st dataset.

  • Col1 , has missing value in row2, so this should be imputed with 0
  • Col2 , has missing value in row4, so this should be imputed with 1
  • Col3 , has missing values in row 3, row 5 so this should be imputed with 0

Basically any missing values should be imputed with the mode value in second lookup dataset.

Expecting a final dataset like this.

 Id   Date         Col1    Col2    Col3
 1211 01/14/2009   1       2       1  
 1211 03/19/2001   0       0       2
 1019 02/21/1999   2       1       0
 2023 03/09/2002   1       1       0
 1245 05/14/2022   0       1       0

Right now I am doing this with very clunky code which is very manual and takes a lot of lines. Any suggestions on how to efficiently impute based on lookup table values is much appreciated. Thanks.


Solution

  • library(dplyr)
    library(tidyr)
    
    df <- tibble::tribble(
      ~Col1, ~Col2, ~Col3,
      1, 2, 1,
      NA, 0, 2,
      2, 1, NA,
      1, NA, 0,
      0, 1, NA
    )
    
    lkp <- data.frame(
      Col1 = 0,
      Col2 = 1,
      Col3 = 0
    )
    
    df |>
      mutate(across(starts_with("Col"), \(x) replace_na(x, lkp[[cur_column()]])))