r optimization regression logistic-regression glm

Estimate a probit regression model with optim()

I need to manually program a probit regression model without using glm. I would use optim for direct minimization of negative log-likelihood.

I wrote code below but it does not work, giving error:

cannot coerce type 'closure' to vector of type 'double'

# load data: data provided via the bottom link
Datospregunta2a <- read.dta("problema2_1.dta")
attach(Datospregunta2a)

# model matrix `X` and response `Y`
X <- cbind(1, associate_professor, full_professor, emeritus_professor, other_rank)
Y <- volunteer

# number of regression coefficients
K <- ncol(X)

# initial guess on coefficients
vi <- lm(volunteer ~ associate_professor, full_professor, emeritus_professor, other_rank)$coefficients

# negative log-likelihood
probit.nll <- function (beta) {
  exb <- exp(X%*%beta)
  prob<- rnorm(exb)
  logexb <- log(prob)
  y0 <- (1-y)
  logexb0 <- log(1-prob)
  yt <- t(y)
  y0t <- t(y0)
  -sum(yt%*%logexb + y0t%*%logexb0)
  }

# gradient
probit.gr <- function (beta) {
  grad <- numeric(K)
  exb <- exp(X%*%beta)
  prob <- rnorm(exb)
  for (k in 1:K) grad[k] <- sum(X[,k]*(y - prob))
  return(-grad)
  }

# direct minimization
fit <- optim(vi, probit.nll, gr = probit.gr, method = "BFGS", hessian =  TRUE)

data: https://drive.google.com/file/d/0B06Id6VJyeb5OTFjbHVHUE42THc/view?usp=sharing

Solution

case sensitive

Y and y are different. So you should use Y not y in your defined functions probit.nll and probit.gr.

These two functions also do not look correct to me. The most evident problem is the existence of rnorm. The following are correct ones.

negative log-likelihood function

# requires model matrix `X` and binary response `Y`
probit.nll <- function (beta) {
  # linear predictor
  eta <- X %*% beta
  # probability
  p <- pnorm(eta)
  # negative log-likelihood
  -sum((1 - Y) * log(1 - p) + Y * log(p))
  }

gradient function

# requires model matrix `X` and binary response `Y`
probit.gr <- function (beta) {
  # linear predictor
  eta <- X %*% beta
  # probability
  p <- pnorm(eta)
  # chain rule
  u <- dnorm(eta) * (Y - p) / (p * (1 - p))
  # gradient
  -crossprod(X, u)
  }

initial parameter values from lm()

This does not sound like a reasonable idea. In no cases should we apply linear regression to binary data.

However, purely focusing on the use of lm, you need + not , to separate covariates in the right hand side of the formula.

reproducible example

Let's generate a toy dataset

set.seed(0)
# model matrix
X <- cbind(1, matrix(runif(300, -2, 1), 100))
# coefficients
b <- runif(4) 
# response
Y <- rbinom(100, 1, pnorm(X %*% b))

# `glm` estimate
GLM <- glm(Y ~ X - 1, family = binomial(link = "probit"))

# our own estimation via `optim`
# I am using `b` as initial parameter values (being lazy)
fit <- optim(b, probit.nll, gr = probit.gr, method = "BFGS", hessian = TRUE)

# comparison
unname(coef(GLM))
# 0.62183195  0.38971121  0.06321124  0.44199523

fit$par
# 0.62183540  0.38971287  0.06321318  0.44199659

They are very close to each other!