Search code examples
rnalogistic-regressionpredictmultinomial

how to get the exact observations analyzed in multinom() in R, or how to make fitted() produce a fit on entire data including NAs


My question is similar to another older question "How to get the number of observations included in a model created using the function multinom in R?" but what I want to look for are the exact observations, not the number of observations, analyzed in the model. Ultimately I want to have the original dataset combined with a new column of the predicted (fitted) probabilities. But let me use an example to illustrate my problem:

If my sample is 1000, some variables have NA values,and I fit a multinom() in R, and use fitted(), then find the length of fitted() is only 870, which means 130 obs are excluded when the model is estimated. Now, the fitted() only generates a 870*1 (i.e. one column) of numbers (probabilities), there is no way for me to know which observation does each probability number corresponds to. I think there're two ways to solve this:

  1. Find out which observations are excluded and delete them in the raw data before estimating the model.
  2. Try to let the fitted() produce a 1000*1 matrix with 130 elements being NA.

I don't know the answer to either one. Any advice would be appreciated. The ultimate goal is to be able to append the fitted probabilities to the original dataset (as a new column) so I can draw inferences. Thanks.


Solution

  • from ?multinom in nnet:

       model: logical. If true, the model frame is saved as component
              'model' of the returned object.
    

    so call multinom(..., model=TRUE), and the model frame will be in the result.

    EDIT:

    Following the example in ?multinom:

    options(contrasts = c("contr.treatment", "contr.poly"))
    library(MASS)
    example(birthwt)
    bwt.mu <- multinom(low ~ ., bwt, model=TRUE)
    

    Viewing the model frame inside the object:

    >     str(bwt.mu$model)
    'data.frame':   189 obs. of  9 variables:
     $ low  : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
     $ age  : int  19 33 20 21 18 21 22 17 29 26 ...
     $ lwt  : int  182 155 105 108 107 124 118 103 123 113 ...
     $ race : Factor w/ 3 levels "white","black",..: 2 3 1 1 1 3 1 3 1 1 ...
     $ smoke: logi  FALSE FALSE TRUE TRUE TRUE FALSE ...
     $ ptd  : Factor w/ 2 levels "FALSE","TRUE": 1 1 1 1 1 1 1 1 1 1 ...
     $ ht   : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
     $ ui   : logi  TRUE FALSE FALSE TRUE TRUE FALSE ...
     $ ftv  : Factor w/ 3 levels "0","1","2+": 1 3 2 3 1 1 2 2 2 1 ...
     - attr(*, "terms")=Classes 'terms', 'formula' length 3 low ~ age + lwt + race + smoke + ptd + ht + ui + ftv
      .. ..- attr(*, "variables")= language list(low, age, lwt, race, smoke, ptd, ht, ui, ftv)
      .. ..- attr(*, "factors")= int [1:9, 1:8] 0 1 0 0 0 0 0 0 0 0 ...
      .. .. ..- attr(*, "dimnames")=List of 2
      .. .. .. ..$ : chr [1:9] "low" "age" "lwt" "race" ...
      .. .. .. ..$ : chr [1:8] "age" "lwt" "race" "smoke" ...
      .. ..- attr(*, "term.labels")= chr [1:8] "age" "lwt" "race" "smoke" ...
      .. ..- attr(*, "order")= int [1:8] 1 1 1 1 1 1 1 1
      .. ..- attr(*, "intercept")= int 1
      .. ..- attr(*, "response")= int 1
      .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
      .. ..- attr(*, "predvars")= language list(low, age, lwt, race, smoke, ptd, ht, ui, ftv)
      .. ..- attr(*, "dataClasses")= Named chr [1:9] "factor" "numeric" "numeric" "factor" ...
      .. .. ..- attr(*, "names")= chr [1:9] "low" "age" "lwt" "race" ...