Search code examples
rregressionimputationr-mice

Read already multiple imputed DataSet with mice (in R)


I currently have an already multiply imputed dataset with the structure: Structure

clientID <- c(4,4,4,4,4,6,6,6,6,6,7,7,7,7,7,15,15,15,15,15)
impID <- c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5)
x <- c(1534500, 1572500, 1555500, 1571500, 1546500, 113000, 113000, 113000, 113000,113000, 4153101,4153101,4153101,4153101,4153101, 1042400, 1044400, 1092400, 1057400, 1051400)
y <- c(14200,14200,14200,14200,14200,160000,15000,14000,14200,4800,12000,14200,10500,14200,48000,150000,150000,150000,150000,150000)
z <- c(200, 200,200,200,200, 400,400,400,400,400,150,150,150,150,150,230,230,230,230,230)
data <- data.frame(clientID=clientID, impID = impID, x=x, y=y, z=z)

regs <- with(data, lm(x ~ y + z))

I want to run a regression, which should give me a result for each imputation (e.g. impID: so 5 regressions in total). Unfortunately, the regression-function do not differentiate between the imputation IDs and runs one regression on the total data.

The package mice is generally used to impute data, which can then be used with with(data, lm(x ~ y + z)), yielding serveral regressions.

I wonder how I can convert the already imputed data to a "recognized" imputed data, without changing the values in the data. This way I could be able to run the regression, yielding 5 results.

Im gratefull for any help!

I tried to create an imputed dataset in mice, but as the dataset is already imputed, the generated data was extended with new entries.


Solution

  • You can split the data on impID using split() which will make a list of data frames. Then, you can turn those into an imputationList and then use with() and MIcombine() on the result.

    library(mitools)
    clientID <- c(4,4,4,4,4,6,6,6,6,6,7,7,7,7,7,15,15,15,15,15)
    impID <- c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5)
    x <- c(1534500, 1572500, 1555500, 1571500, 1546500, 113000, 113000, 113000, 113000,113000, 4153101,4153101,4153101,4153101,4153101, 1042400, 1044400, 1092400, 1057400, 1051400)
    y <- c(14200,14200,14200,14200,14200,160000,15000,14000,14200,4800,12000,14200,10500,14200,48000,150000,150000,150000,150000,150000)
    z <- c(200, 200,200,200,200, 400,400,400,400,400,150,150,150,150,150,230,230,230,230,230)
    data <- data.frame(clientID=clientID, impID = impID, x=x, y=y, z=z)
    
    sp <- split(data, data$impID)
    implist <- imputationList(sp)
    
    regs <- with(implist, lm(x ~ y + z))
    
    summary(MIcombine(regs))
    #> Multiple imputation results:
    #>       with(implist, lm(x ~ y + z))
    #>       MIcombine.default(regs)
    #>                   results           se       (lower       upper) missInfo
    #> (Intercept)  5.294024e+06 2.300637e+06 781764.53408 9.806284e+06      5 %
    #> y           -7.903431e+00 1.362453e+01    -34.60722 1.880036e+01      0 %
    #> z           -1.279535e+04 9.454923e+03 -31352.04175 5.761333e+03      7 %
    

    Created on 2024-02-06 with reprex v2.0.2