Search code examples
rmissing-datamodeimputation

r replace missing values with a constant and column name follow a common pattern


My dataset has columns and values like this. The column names all start with a common string, Col_a_**

 ID    Col_a_01    Col_a_02    Col_a_03
 1     1           2           1
 2     1           NA          0
 3     NA          0           2
 4     1           0           1
 5     0           0           2

My goal is to replace the missing values with the mode values for that column.

The expected dataset to be like this

  ID    Col_a_01    Col_a_02    Col_a_03
  1     1           2           1
  2     1           0**         0
  3     1**         0           2
  4     1           0           1
  5     0           0           2

The NA in the first column is replaced by 1 because the mode of the 1st column is 1. The NA in the second column is replaced by 0 because the mode for the 2nd column is 0.

I can do this like this below

getmode <- function(v) {
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}

 df$Col_a_01[is.na(Col_a_01)==TRUE] <- getmode(df$Col_a_01)
 df$Col_a_03[is.na(Col_a_02)==TRUE] <- getmode(df$Col_a_02)
 df$Col_a_03[is.na(Col_a_03)==TRUE] <- getmode(df$Col_a_03)

But this becomes unwieldy if I have 100 columns starting with the similar names ending in 1,2,3..100. I am curious if there is an easier and more elegant way to accomplish this. Thanks in advance.


Solution

  • We can use na.aggregate with FUN specified as getmode

    library(zoo)
    library(dplyr)
    df1 <- df1 %>%
       mutate(across(starts_with('Col_a'), na.aggregate, FUN = getmode))
    

    -output

    df1
      ID Col_a_01 Col_a_02 Col_a_03
    1  1        1        2        1
    2  2        1        0        0
    3  3        1        0        2
    4  4        1        0        1
    5  5        0        0        2
    

    Or it can be simply

    na.aggregate(df1, FUN = getmode)
    ID Col_a_01 Col_a_02 Col_a_03
    1  1        1        2        1
    2  2        1        0        0
    3  3        1        0        2
    4  4        1        0        1
    5  5        0        0        2