Search code examples
rstatisticsregressionmissing-dataimputation

R Regression imputation on missing data


Hi! I'm trying to apply regression imputation on miss values of a dataset 'chmiss' from package 'faraway' and library 'faraway', but the code I have so far is having trouble to fit regression with dataframe when dropping a column happens the same time. Could anyone give me a hand on correcting the code?

X <- chmiss
for(j in c(1:4,6)){
     new_Y <- X[,j]
     new_X <- X[,c(-j,-5)]
     new_XY <- cbind(new_X,new_Y)
     temp_lm <- lm(new_Y~.,data=new_XY)
     X[is.na(new_Y),j] <- predict(temp_lm,new_X[is.na(new_Y),c(-j,-5)])
}

Solution

  • Try this:

    library(faraway)
    data(chmiss)
    X <- chmiss
    for(j in c(1:4,6)){
      new_Y <- X[,j]
      new_X <- X[,c(-j,-5)]
      new_XY <- cbind(new_X,new_Y)
      temp_lm <- lm(new_Y~.,data=new_XY)
      X[is.na(new_Y),j] <- predict(temp_lm,new_X[is.na(new_Y),]) ## difference here
    }
    

    You remove the columns c(-j,-5) already to create new_X, so when you do it again for the predict call it drop useful columns instead.