r logistic-regression multinomial log-likelihood

Calculation of log likelihood function of multinomial logistic regression in R

Suppose I have the following data set

df=data.frame(x1=rnorm(100), #predictor 1
              x2=rpois(100,2.5), #predictor 2
              x3=rgeom(100,prob = 0.48), #predictor 3
              y=as.factor(sample(1:3,100,replace = T)) #categorical response
              )

If I run the multinomial logistic regression by considering the 1 as the reference category, then the estimated parameters are

Call:
multinom(formula = y ~ ., data = df)

Coefficients:
  (Intercept)         x1          x2         x3
2 -0.71018723 -0.4193710  0.15820110 0.05849252
3 -0.05987773 -0.2978596 -0.08335957 0.10149408

I would like to calculate the loglikelihood value of the multinomial logistic regression using these estimated parameters.

Any help is appreciated.

Solution

This should work. The log-likelihood is just the sum of the log of the probabilities that each observation takes on its observed value. In the code below probs is an N x m matrix of probabilities for each of the N observations on each of the m categories. We can then get y from the model frame and turn it into a numeric variable which will indicate the category number. We then use cbind(1:length(y), y) to index the probability matrix. This makes an N x 2 matrix that gives for each row number (in the first column) the column number of the probs matrix that you should keep. So, probs[cbind(1:length(y), y)] creates a vector of probabilities that each observation takes on its observed y value. We can log them and then sum them to get the log-likelihood.

df=data.frame(x1=rnorm(100), #predictor 1
              x2=rpois(100,2.5), #predictor 2
              x3=rgeom(100,prob = 0.48), #predictor 3
              y=as.factor(sample(1:3,100,replace = T)) #categorical response
)

mod <- nnet::multinom(formula = y ~ ., data = df)

probs <- predict(mod, type="probs")
y <- as.numeric(model.response(model.frame(mod)))
indiv_ll <- log(probs[cbind(1:length(y), y)])
sum(indiv_ll)
# [1] -106.8012
logLik(mod)
# 'log Lik.' -106.8012 (df=8)