Search code examples
rimputationr-mice

Imputation of specific columns with mice()


I would like to use data imputation by using the mice package. My dataset contains the columns "A" to "G", but I only want to impute the values of column C and D.

In this article (https://www.r-bloggers.com/2016/06/handling-missing-data-with-mice-package-a-simple-approach/) it is explained how to exclude variables from being a predictor or being imputed - but I would like to use mice the other way round: I want to specify which variables ARE imputed - so only C and D should be imputed.

Is this possible?

Thank you!


Solution

  • Answer

    Just invert the logic: In the methods vector, set every variable that is not one of your variables of interest to "":

    meth[!names(meth) %in% c("C", "D")] <- ""
    

    Example: Only impute Petal.Length and Petal.Width

    data <- mice::ampute(iris, prop = 0.1)$amp
    init <- mice(data, maxit = 0)
    meth <- init$meth
    meth[!names(meth) %in% c("Petal.Length", "Petal.Width")] <- ""
    mice(data, meth = meth)
    

    Rationale

    You can supply a vector to the method argument of mice::mice. This vector should contain the methods that you want to use to impute the variables you want to impute. In the example they first do a dry-run (init <- mice(data, maxit = 0)), where the output contains a preset vector for you (init$method). For my example, it looks like this:

    Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
           "pmm"        "pmm"        "pmm"        "pmm"        "pmm"
    

    You can avoid variables being imputed by setting the method to "". This is one way to exclude variables. As I show with my example, you can invert that logic, thus ending up with only the variables you want to include.