Search code examples
rlogistic-regressionmultinomiallog-likelihood

Calculation of log likelihood function of multinomial logistic regression in R


Suppose I have the following data set

df=data.frame(x1=rnorm(100), #predictor 1
              x2=rpois(100,2.5), #predictor 2
              x3=rgeom(100,prob = 0.48), #predictor 3
              y=as.factor(sample(1:3,100,replace = T)) #categorical response
              )

If I run the multinomial logistic regression by considering the 1 as the reference category, then the estimated parameters are

Call:
multinom(formula = y ~ ., data = df)

Coefficients:
  (Intercept)         x1          x2         x3
2 -0.71018723 -0.4193710  0.15820110 0.05849252
3 -0.05987773 -0.2978596 -0.08335957 0.10149408

I would like to calculate the loglikelihood value of the multinomial logistic regression using these estimated parameters.

Any help is appreciated.


Solution

  • This should work. The log-likelihood is just the sum of the log of the probabilities that each observation takes on its observed value. In the code below probs is an N x m matrix of probabilities for each of the N observations on each of the m categories. We can then get y from the model frame and turn it into a numeric variable which will indicate the category number. We then use cbind(1:length(y), y) to index the probability matrix. This makes an N x 2 matrix that gives for each row number (in the first column) the column number of the probs matrix that you should keep. So, probs[cbind(1:length(y), y)] creates a vector of probabilities that each observation takes on its observed y value. We can log them and then sum them to get the log-likelihood.

    df=data.frame(x1=rnorm(100), #predictor 1
                  x2=rpois(100,2.5), #predictor 2
                  x3=rgeom(100,prob = 0.48), #predictor 3
                  y=as.factor(sample(1:3,100,replace = T)) #categorical response
    )
    
    mod <- nnet::multinom(formula = y ~ ., data = df)
    
    probs <- predict(mod, type="probs")
    y <- as.numeric(model.response(model.frame(mod)))
    indiv_ll <- log(probs[cbind(1:length(y), y)])
    sum(indiv_ll)
    # [1] -106.8012
    logLik(mod)
    # 'log Lik.' -106.8012 (df=8)