Search code examples
rmatrixnamedianimputation

Replacing NA's in each column of matrix with the median of that column


I am trying to replace the NA's in each column of a matrix with the median of of that column, however when I try to use lapply or sapply I get an error; the code works when I use a for-loop and when I change one column at a time, what am I doing wrong?

Example:

set.seed(1928)
mat <- matrix(rnorm(100*110), ncol = 110)
mat[sample(1:length(mat), 700, replace = FALSE)] <- NA
mat1 <- mat2 <- mat

mat1 <- lapply(mat1,
  function(n) {
     mat1[is.na(mat1[,n]),n] <- median(mat1[,n], na.rm = TRUE)
  }
)   

for (n in 1:ncol(mat2)) {
  mat2[is.na(mat2[,n]),n] <- median(mat2[,n], na.rm = TRUE)
}

Solution

  • I would suggest vectorizing this using the matrixStats package instead of calculating a median per column using either of the loops (sapply is also a loop in a sense that its evaluates a function in each iteration).

    First, we will create a NAs index

    indx <- which(is.na(mat), arr.ind = TRUE)
    

    Then, replace the NAs using the precalculated column medians and according to the index

    mat[indx] <- matrixStats::colMedians(mat, na.rm = TRUE)[indx[, 2]]