Search code examples
rdatabasestatisticsimputationr-mice

MICE imputation


I have a dataset like this

structure(list(age = c(20, 21, 30, NA, NA, NA, 50, 61, 60, 63, 
NA, NA, NA), sex = c(NA, 0, NA, 1, NA, 1, 0, NA, NA, NA, NA, 
0, 1), diabetes = c(NA, NA, 1, 1, NA, 1, NA, 1, 1, 1, 0, 0, NA
), hypertension = c(1, NA, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1), 
    hypercholesterolemia = c(1, 1, NA, 1, 0, 0, NA, 1, NA, 1, 
    0, 0, 0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-13L))

Could you please tell me how I can perform MICE imputation? I want to imput all the missing values. I tried reading tutorials on the Internet but I get errors or I don't impute everything. Just the code with example is enough, I will adjust the settings later


Solution

  • As a starting point, I brought here an example. The following default settings are used in the mice function to start imputation, so I just here brought important parameters which are 'm' i.e how many imputed dataset must be generated, 'maxit' or how many iterations should be usesd for each imputed dataset, and imputation method or 'method' argument which I used here predictive mean matching 'pmm'. But for complete explanation of these options within the mice function, see ?mice. Then you may decide how to adjust these options effectively. importing your data

    df<- structure(list(age = c(20, 21, 30, NA, NA, NA, 50, 61, 60, 63, 
                           NA, NA, NA), sex = c(NA, 0, NA, 1, NA, 1, 0, NA, NA, NA, NA, 
                                                0, 1), diabetes = c(NA, NA, 1, 1, NA, 1, NA, 1, 1, 1, 0, 0, NA
                                                ), hypertension = c(1, NA, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1), 
                   hypercholesterolemia = c(1, 1, NA, 1, 0, 0, NA, 1, NA, 1, 
                                            0, 0, 0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
                                                                                                               -13L))
    

    Start imputation using mice() function as example:

    imp <- mice(df
                 ,m = 10
                 ,maxit = 10
                 ,method = 'pmm'
                 ,printFlag = FALSE # do not show imputation process
                  ) 
    
    #A summary of the imputation results can be obtained by calling the imp object.
    imp
    

    The imputed datasets can be extracted by using the complete function.

    miceOutput <- complete(imp, action='long') # generate all completed data sets in long format
    

    The imputed datasets can further be used in mice to conduct pooled analyses or to store them for next use. Hope it could helps