Search code examples
rloopsimputation

How to impute NAs in many columns with the values from other columns?


I have a dataset with 200 variables that all have some missing values. Each of the 200 variables has another column that I want to use to impute the missing values.

Example data:

have <- data.frame(ID = c(1:10), var1 = c(runif(7), NA, NA, NA), var1_fill = runif(10))

   ID       var1     var1_fill
1   1 0.68783885   0.140508053
2   2 0.74672512   0.001270443
3   3 0.09607276   0.917535359
4   4 0.03222775   0.363960434
5   5 0.03560543   0.901288399
6   6 0.46595122   0.725499220
7   7 0.42781890   0.781295939
8   8         NA   0.737999219
9   9         NA   0.456795266
10 10         NA   0.314562042

If I wanted to impute for one column I would use this code:

have$var1_imputed <- ifelse(is.na(have$var1) == T, have$var1_fill, have$var1)

   ID       var1     var1_fill var1_imputed
1   1 0.68783885   0.140508053   0.68783885
2   2 0.74672512   0.001270443   0.74672512
3   3 0.09607276   0.917535359   0.09607276
4   4 0.03222775   0.363960434   0.03222775
5   5 0.03560543   0.901288399   0.03560543
6   6 0.46595122   0.725499220   0.46595122
7   7 0.42781890   0.781295939   0.42781890
8   8         NA   0.737999219   0.73799922
9   9         NA   0.456795266   0.45679527
10 10         NA   0.314562042   0.31456204

I am having trouble figuring out how to write a loop to do this for 200 variables, because I can't use the $ to reference the column names. In the actual dataset, the variable names don't follow any pattern like var1, var2, etc. However, the original 200 variables are located in columns 7 to 206, and the columns to use for the respective imputation are 207 to 406. The columns for imputation also have the same name as the original columns but with an additional suffix, like in the example (var1 and var1_fill).


Solution

  • A data.table option using fcoalesce

    setDT(df)[
      ,
      setNames(
        Map(fcoalesce, .SD[, 7:206], .SD[, 207:406]),
        paste0(names(.SD[, 7:206]), "_imputed")
      )
    ]