Search code examples
juliaeconomics

Logitstic Regression In Julia Returning Incorrect results


I am trying to use optimize from Julia's Optim package to estimate the vector beta = [beta_0, beta_1]' , but I am getting unreasonable results.

I've provided a minimum working example where the results estimate [27.04, -14.38] when the true values are [1, 1].

What am I doing wrong here?

Here is a minimum working example. It's my first one, so please let me know how it could be improved.

using Distributions
using Optim
using LinearAlgebra
import Random

Random.seed!(42)

# generate data
true_beta = [1; 1];
N=500;
X = [ones(N) rand(Normal(0,1), N)];
u = rand(Normal(0,1), N)
# generate the latent variable
y_star = X * true_beta + u;
# generate observed variable
y = Vector{Int64}(zeros(N));
for i in 1:N
    if y_star[i] >= 0
        y[i] = 1
    end
end

# (negative of) loglikelihood function
function loglike(beta::Vector{Float64})
    l = Vector{Float64}()
    pk = 1/(1+exp(-X[i,:]'*beta))
    lhood = -y[i,1]*log(pk) - (1-y[i,1])*log(1-pk)
    for i in 1:size(y,1)
        push!(l, lhood)
    end
    return sum(l)
end

# initial guess: ols
ols = inv(X'X)X'y;
# minimize negative loglikelihood
res = optimize(loglike, ols)
# save parameter estimates of beta
betahat = res.minimizer

Solution

  • The reason is that your function is not defined correctly. It should be:

    function loglike(beta::Vector{Float64})
        l = Vector{Float64}()
        for i in 1:size(y,1)
            pk = 1/(1+exp(-X[i,:]'*beta))
            lhood = -y[i]*log(pk) - (1-y[i])*log(1-pk)
            push!(l, lhood)
        end
        return sum(l)
    end
    

    And you can check that the result is correct by running:

    using GLM
    glm(@formula(y~x), (y=y, x=X[:, 2]), Binomial(), LogitLink())
    

    Also notice that your data generating process is incorrect. You are adding normal noise and you should add logistic noise. With normal noise the correct model is Probit. If you use it e.g. like:

    glm(@formula(y~x), (y=y, x=X[:, 2]), Binomial(), ProbitLink())
    

    you will indeed recover both parameters to be around 1.