Search code examples
rr-mice

How to apply a model other than lm or glm to multiply imputed data?


I have a dataset containing a person id and the 6 answers to a questionnaire called sevenup:

> names(sevup_mice_data)
[1] "record_id"  "sevenup_01" "sevenup_02" "sevenup_03" "sevenup_04" "sevenup_05" "sevenup_06" "sevenup_07"

All answers are numbers between 0 and 5. There are missing values in column sevenup_06, so I want to use mice to impute it.

Here is what I have done so far:

sevup_mice <- mice(sevup_mice_data, m = 5, method = "pmm", seed = 0, 
                   predictorMatrix = quickpred(sevup_mice_data, exclude = "record_id"))

Now, in most mice tutorials I have seen, people use a linear model and get the fit parameters, and then join the results using pool, for example something like:

fit <- with(sevup_mice, exp = lm(sevenup_05 ~ sevenup_04 + sevenup_06))
pool(fit)

However, I do not need to fit a lm to my data, I only want to get a final score for each person, that is the sum of the answers to each question.

If I didn't impute data, I would calculate it like this:

sevup_mice_data$sevup_score <- rowSums(sevup_mice_data[2:ncol(sevup_mice_data)], na.rm=TRUE)

So I would like to do that to each of the 5 imputed datasets contained in sevup_mice, is there a way to do that without a loop, with the with function for example ? And after that, can I aggregate the results with pool since the result of my analysis are not fitting parameters, but single columns ?


Solution

  • Let's try this:

    library(mice)
    set.seed(100)
    mat = matrix(rnorm(100,rep(1:10,10)),ncol=10)
    mat[sample(length(mat),20)]<-NA
    

    Then we impute:

    imp = mice(mat,m = 5, method = "pmm")
    

    There is a function call complete to basically complete the matrix using each imputation:

    impdata = complete(imp,"all")
    head(impdata[[1]])
            V1       V2       V3        V4        V5        V6        V7        V8
    1 5.116971 8.086186 0.561910 0.9088864 0.8983708 0.5529378 0.7380042 6.0127497
    2 6.318630 2.096274 2.764061 3.8888065 4.4777166 0.2614021 6.5819589 0.9356443
    3 2.921083 8.086186 3.261961 2.8620704 1.2232244 3.1788648 2.6211164 2.9379040
    4 4.886785 6.611146 4.773405 3.8888065 4.6228674 5.8974657 6.5819589 2.9379040
    5 5.116971 5.123380 4.185621 4.3099857 4.4777166 2.7280745 5.1298341 2.9379040
    6 6.318630 5.970683 5.561549 5.7782058 7.3222310 6.9804641 5.2869750 6.0127497
            V9       V10
    1 1.896822 0.4428777
    2 5.842095 3.4283014
    3 1.654651 7.8213169
    4 2.068788 2.8424288
    5 5.709582 4.4697035
    6 5.842095 0.4428777
    

    If you wanna do rowSums on each imputed dataset, you do:

    sapply(impdata,rowSums)
                 1        2        3        4        5
     [1,] 25.21572 25.27762 26.85518 18.89534 23.55415
     [2,] 36.59489 44.62157 43.48562 48.05143 35.17675
     [3,] 36.56838 34.46168 31.17314 30.25396 32.26478
     [4,] 45.11155 47.54594 46.59836 47.54594 45.11155
     [5,] 44.18877 44.18877 44.18877 44.18877 44.18877
     [6,] 55.51646 62.89490 63.89955 57.91601 58.50188
     [7,] 65.75129 68.00360 70.00043 65.89644 68.00360
     [8,] 77.44877 83.87630 86.05698 86.05698 87.27713
     [9,] 86.65979 91.35599 89.35916 86.65979 90.15827
    [10,] 85.19222 90.37659 84.34492 86.62083 88.81410