Search code examples
rimputationr-mice

Using imputation models created from amelia or mice in R for new data


Suppose I run one of the missing variable imputation R packages, amelia or mice (or similar), on a large data frame -- let's say 100000 rows and 50 columns -- to get imputations for one particular column with some (let's say 200) NAs in it.

Is there a way to save the derived imputation algorithm so that when I get new data with 1000 new rows, I can simply apply the algorithm to that new data?

The goal is to impute any new NAs in the new data set using the same algorithm as the what was in the base data.

Thank you in advance -- if this isn't clear, I'm happy to answer any questions.


Solution

  • caret comes close to what you want: This assumes all new data takes on the same variables. Imputation(s) by caret and mice however do have different accuracies(in my experience).

    library(caret)
    mydata<-data.frame(A=c(rep(NA,900),rep(3,900)),B=c(rep(NA,200),rep(3,400)))
    mydata1<-data.frame(D=mydata,E=rep(mydata))
    prep<-preProcess(mydata,method = "medianImpute")
    df_new<-predict(prep,mydata)
    df_new
    df_new2<-predict(prep,mydata1)