Search code examples
rsummaryimputationr-mice

Descriptive stats for MI data in R: Take 3


As an R beginner, I have found it surprisingly difficult to figure out how to compute descriptive statistics on multiply imputed data (more so than running some of the other basic analyses, such as correlations and regressions).

These types of questions are prefaced with apologies (Descriptive statistics (Means, StdDevs) using multiply imputed data: R) but have have not been answered (https://stats.stackexchange.com/questions/296193/pooling-basic-descriptives-from-several-multiply-imputed-datasets-using-mice) or are quickly cast a down vote.

Here is a description of a miceadds function(https://www.rdocumentation.org/packages/miceadds/versions/2.10-14/topics/stats0), which I find difficult to follow with data that has been stored in the mids format.

I have gotten some output such as mean, median, min, max using the summary(complete(imp)) but would love to know how to get additional summary output (e.g., skew/kurtosis, standard deviation, variance).

Illustration borrowed from a previous poster above:

  > imp <- mice(nhanes, seed = 23109)

    iter imp variable
    1   1  bmi  hyp  chl
    1   2  bmi  hyp  chl
    1   3  bmi  hyp  chl
    1   4  bmi  hyp  chl
    1   5  bmi  hyp  chl
    2   1  bmi  hyp  chl
    2   2  bmi  hyp  chl
    2   3  bmi  hyp  chl

  > summary(complete(imp))
   age         bmi        hyp         chl     
   1:12   Min.   :20.40   1:18   Min.   :113  
   2: 7   1st Qu.:24.90   2: 7   1st Qu.:186  
   3: 6   Median :27.40          Median :199  
          Mean   :27.37          Mean   :194  
          3rd Qu.:30.10          3rd Qu.:218  
          Max.   :35.30          Max.   :284  

Would someone kindly take the time to illustrate how one might take the mids object to get the basic descriptives?


Solution

  • Below are some steps you can do to better understand what happens with R objects after each step. I would also recommend to look at this tutorial: https://gerkovink.github.io/miceVignettes/

    library(mice)
    
    # nhanes object is just a simple dataframe: 
    data(nhanes)
    str(nhanes)
    #'data.frame':  25 obs. of  4 variables:
    #  $ age: num  1 2 1 3 1 3 1 1 2 2 ...
    #$ bmi: num  NA 22.7 NA NA 20.4 NA 22.5 30.1 22 NA ...
    #$ hyp: num  NA 1 1 NA 1 NA 1 1 1 NA ...
    #$ chl: num  NA 187 187 NA 113 184 118 187 238 NA ...
    
    # you can generate multivariate imputation using mice() function
    imp <- mice(nhanes, seed=23109)
    
    #The output variable is an object of class "mids" which you can explore using str() function
    str(imp)
    # List of 17
    # $ call           : language mice(data = nhanes)
    # $ data           :'data.frame':  25 obs. of  4 variables:
    #   ..$ age: num [1:25] 1 2 1 3 1 3 1 1 2 2 ...
    # ..$ bmi: num [1:25] NA 22.7 NA NA 20.4 NA 22.5 30.1 22 NA ...
    # ..$ hyp: num [1:25] NA 1 1 NA 1 NA 1 1 1 NA ...
    # ..$ chl: num [1:25] NA 187 187 NA 113 184 118 187 238 NA ...
    # $ m              : num 5
    # ...
     # $ imp            :List of 4
      #..$ age: NULL
      #..$ bmi:'data.frame':    9 obs. of  5 variables:
      #.. ..$ 1: num [1:9] 28.7 30.1 22.7 24.9 30.1 35.3 27.5 29.6 33.2
      #.. ..$ 2: num [1:9] 27.2 30.1 27.2 25.5 29.6 26.3 26.3 30.1 30.1
      #.. ..$ 3: num [1:9] 22.5 30.1 20.4 22.5 27.4 22 26.3 27.4 35.3
      #.. ..$ 4: num [1:9] 27.2 22 22.7 21.7 25.5 27.2 24.9 30.1 22
      #.. ..$ 5: num [1:9] 28.7 28.7 20.4 21.7 25.5 22.5 22.5 25.5 22.7
    #...
    
    
    #You can extract individual components of this object using $, for example
    #To view the actual imputation for bmi column
    imp$imp$bmi
    #       1    2    3    4    5
    # 1  28.7 27.2 22.5 27.2 28.7
    # 3  30.1 30.1 30.1 22.0 28.7
    # 4  22.7 27.2 20.4 22.7 20.4
    # 6  24.9 25.5 22.5 21.7 21.7
    # 10 30.1 29.6 27.4 25.5 25.5
    # 11 35.3 26.3 22.0 27.2 22.5
    # 12 27.5 26.3 26.3 24.9 22.5
    # 16 29.6 30.1 27.4 30.1 25.5
    # 21 33.2 30.1 35.3 22.0 22.7
    
    # The above output is again just a regular dataframe:
    str(imp$imp$bmi)
    # 'data.frame':  9 obs. of  5 variables:
    #   $ 1: num  28.7 30.1 22.7 24.9 30.1 35.3 27.5 29.6 33.2
    # $ 2: num  27.2 30.1 27.2 25.5 29.6 26.3 26.3 30.1 30.1
    # $ 3: num  22.5 30.1 20.4 22.5 27.4 22 26.3 27.4 35.3
    # $ 4: num  27.2 22 22.7 21.7 25.5 27.2 24.9 30.1 22
    # $ 5: num  28.7 28.7 20.4 21.7 25.5 22.5 22.5 25.5 22.7
    
    # complete() function returns imputed dataset:
    mat <- complete(imp)
    
    # The output of this function is a regular data frame:
    str(mat)
    # 'data.frame':  25 obs. of  4 variables:
    # $ age: num  1 2 1 3 1 3 1 1 2 2 ...
    # $ bmi: num  28.7 22.7 30.1 22.7 20.4 24.9 22.5 30.1 22 30.1 ...
    # $ hyp: num  1 1 1 2 1 2 1 1 1 1 ...
    # $ chl: num  199 187 187 204 113 184 118 187 238 229 ...
    
    # So you can run any descriptive statistics you need with this object
    # Just like you would do with a regular dataframe:
    > summary(mat)
    # age            bmi             hyp            chl       
    # Min.   :1.00   Min.   :20.40   Min.   :1.00   Min.   :113.0  
    # 1st Qu.:1.00   1st Qu.:24.90   1st Qu.:1.00   1st Qu.:187.0  
    # Median :2.00   Median :27.50   Median :1.00   Median :204.0  
    # Mean   :1.76   Mean   :27.48   Mean   :1.24   Mean   :204.9  
    # 3rd Qu.:2.00   3rd Qu.:30.10   3rd Qu.:1.00   3rd Qu.:229.0  
    # Max.   :3.00   Max.   :35.30   Max.   :2.00   Max.   :284.0