Search code examples
rstatisticscellmean

Cell Mean Imputation


I wish I was better at R, but I need some help with something pretty basic.

I am having some problems writing a function that will do cell mean imputation. The data I am currently working with has 3 columns and the way I currently have the function written, the mean for the observed values in the 3rd column is imputed into all of the NAs in all 3 columns. How can I fix this? Thank you!

cellmean.imp <- function(a){

  for (i in 1:dim(a)[2]){

    new=replace(a, is.na(a), mean(a[, i], na.rm=TRUE))

  }
  return(new)
}

Sorry, I forgot to add: I am trying to impute the mean of the observed values for the 1st column into the NAs in the first column, then the mean of the observed values for hte 2nd column into the NAs in the second column, and so on.


Solution

  • If I understand you correctly, you want to put the mean of third column into every NA in the matrix

    x <- matrix(rnorm(30),10,3)
    

    Introduce a few NAs

    x[3,1] <- NA
    x[4,1] <- NA
    x[5,2] <- NA
    x[6,3] <- NA
    

    Replace them with the third column mean

    x[is.na(x)] <- mean(x[,3],na.rm=TRUE)