Search code examples
rlinear-regressionmissing-dataimputationr-mice

Can I conduct pooled regression analysis on only a subsample of a dataset imputed with MICE in R?


I conducted multiple imputation using the 'mice' package in R. Afterwards, I calculated pooled regression analyses using the 'with' and 'pool' functions.

For further analyses, I only want to look at a specific subsample of the data. I would like to use the imputed data with pooled regression analysis for that aswell.

However, I am struggling to find a way to achieve that. That is because pooled regression analysis in 'mice' works by using the 'with' and 'lm' function on a object of class 'mids', instead of just calling 'lm' on a dataframe. Therefore, I can't just subset the data by conventional means, such as using square brackets or the 'subset' function.

I know that I could theoretically just extract the imputed datasets using the 'complete' function, conduct regression analyses on these datasets, and then pool the results by hand, but I would like to avoid that.

An example of what I want to do would be:

library(mice)

data <- as.data.frame(matrix(data = c(3, 2, 3, 4, 5, NA, 7, 10, 9, NA, NA, 12, 13, 14, 15, 16, NA, 18), nrow = 6))
names(data) <- c("a", "b", "c")
data$Sex <- c("male", "male", "female", "male", "female", "female")

imp <- mice(data = data,
            m = 20,
            maxit = 10,
            seed = 12,
            print = FALSE)

Now, I can conduct pooled regression analysis by using:

summary(pool(with(imp, lm(a ~ b + c))))

What I am struggling to achieve is conducting a regression analysis on only the male subjects.


Solution

  • mice returns an object of class mids, which can be subsetted with a boolean vector using filter:

    filter(imp, Sex %in% "male")
    
    # or for more detail
    imp_filtered <- filter(imp, Sex %in% "male")
    imp_filtered$data
    
    #  a  b  c  Sex
    #1 3  7 13 male
    #2 2 10 14 male
    #4 4 NA 16 male
    

    So to implement this, you can save a filtered object or modify your code slightly:

    # save filtered data to new object
    
    imp_filtered <- filter(imp, Sex %in% "male")
    summary(pool(with(imp_filtered, lm(a ~ b + c))))
    
    # or all in one go
    
    summary(pool(with(filter(imp, Sex %in% "male"), lm(a ~ b + c))))