My question is similar to another older question "How to get the number of observations included in a model created using the function multinom in R?" but what I want to look for are the exact observations, not the number of observations, analyzed in the model. Ultimately I want to have the original dataset combined with a new column of the predicted (fitted) probabilities. But let me use an example to illustrate my problem:
If my sample is 1000, some variables have NA values,and I fit a multinom() in R, and use fitted(), then find the length of fitted() is only 870, which means 130 obs are excluded when the model is estimated. Now, the fitted() only generates a 870*1 (i.e. one column) of numbers (probabilities), there is no way for me to know which observation does each probability number corresponds to. I think there're two ways to solve this:
I don't know the answer to either one. Any advice would be appreciated. The ultimate goal is to be able to append the fitted probabilities to the original dataset (as a new column) so I can draw inferences. Thanks.
from ?multinom
in nnet:
model: logical. If true, the model frame is saved as component
'model' of the returned object.
so call multinom(..., model=TRUE)
, and the model frame will be in the result.
EDIT:
Following the example in ?multinom
:
options(contrasts = c("contr.treatment", "contr.poly"))
library(MASS)
example(birthwt)
bwt.mu <- multinom(low ~ ., bwt, model=TRUE)
Viewing the model frame inside the object:
> str(bwt.mu$model)
'data.frame': 189 obs. of 9 variables:
$ low : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ age : int 19 33 20 21 18 21 22 17 29 26 ...
$ lwt : int 182 155 105 108 107 124 118 103 123 113 ...
$ race : Factor w/ 3 levels "white","black",..: 2 3 1 1 1 3 1 3 1 1 ...
$ smoke: logi FALSE FALSE TRUE TRUE TRUE FALSE ...
$ ptd : Factor w/ 2 levels "FALSE","TRUE": 1 1 1 1 1 1 1 1 1 1 ...
$ ht : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ ui : logi TRUE FALSE FALSE TRUE TRUE FALSE ...
$ ftv : Factor w/ 3 levels "0","1","2+": 1 3 2 3 1 1 2 2 2 1 ...
- attr(*, "terms")=Classes 'terms', 'formula' length 3 low ~ age + lwt + race + smoke + ptd + ht + ui + ftv
.. ..- attr(*, "variables")= language list(low, age, lwt, race, smoke, ptd, ht, ui, ftv)
.. ..- attr(*, "factors")= int [1:9, 1:8] 0 1 0 0 0 0 0 0 0 0 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:9] "low" "age" "lwt" "race" ...
.. .. .. ..$ : chr [1:8] "age" "lwt" "race" "smoke" ...
.. ..- attr(*, "term.labels")= chr [1:8] "age" "lwt" "race" "smoke" ...
.. ..- attr(*, "order")= int [1:8] 1 1 1 1 1 1 1 1
.. ..- attr(*, "intercept")= int 1
.. ..- attr(*, "response")= int 1
.. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. ..- attr(*, "predvars")= language list(low, age, lwt, race, smoke, ptd, ht, ui, ftv)
.. ..- attr(*, "dataClasses")= Named chr [1:9] "factor" "numeric" "numeric" "factor" ...
.. .. ..- attr(*, "names")= chr [1:9] "low" "age" "lwt" "race" ...