Suppose I have the following data set
df=data.frame(x1=rnorm(100), #predictor 1
x2=rpois(100,2.5), #predictor 2
x3=rgeom(100,prob = 0.48), #predictor 3
y=as.factor(sample(1:3,100,replace = T)) #categorical response
)
If I run the multinomial logistic regression by considering the 1
as the reference category, then the estimated parameters are
Call:
multinom(formula = y ~ ., data = df)
Coefficients:
(Intercept) x1 x2 x3
2 -0.71018723 -0.4193710 0.15820110 0.05849252
3 -0.05987773 -0.2978596 -0.08335957 0.10149408
I would like to calculate the loglikelihood value of the multinomial logistic regression using these estimated parameters.
Any help is appreciated.
This should work. The log-likelihood is just the sum of the log of the probabilities that each observation takes on its observed value. In the code below probs
is an N x m matrix of probabilities for each of the N observations on each of the m categories. We can then get y
from the model frame and turn it into a numeric variable which will indicate the category number. We then use cbind(1:length(y), y)
to index the probability matrix. This makes an N x 2 matrix that gives for each row number (in the first column) the column number of the probs
matrix that you should keep. So, probs[cbind(1:length(y), y)]
creates a vector of probabilities that each observation takes on its observed y value. We can log them and then sum them to get the log-likelihood.
df=data.frame(x1=rnorm(100), #predictor 1
x2=rpois(100,2.5), #predictor 2
x3=rgeom(100,prob = 0.48), #predictor 3
y=as.factor(sample(1:3,100,replace = T)) #categorical response
)
mod <- nnet::multinom(formula = y ~ ., data = df)
probs <- predict(mod, type="probs")
y <- as.numeric(model.response(model.frame(mod)))
indiv_ll <- log(probs[cbind(1:length(y), y)])
sum(indiv_ll)
# [1] -106.8012
logLik(mod)
# 'log Lik.' -106.8012 (df=8)