r statistics least-squares nonlinear-optimization

Matching lm and optim coefficient estimates for linear model with multiplicative error

This question is a mix of math and programming, but I'm guessing the solution lay on the programming side of things.

Suppose I have a linear model with a multiplicative error.

$Y_i = (a + bx_i)\epsilon_i, \ \epsilon_i \sim LN(1, \sigma^2)$

I'd like to estimate my coefficients a and b in R. I've found the solution in the top answer here and the proof seems to make sense. I've also found out how to do OLS with heteroskedasticity-robust standard errors here. My interpretation of the results between the two resources is that the estimated values of the coefficients in both plain-Jane OLS and heteroskedastically-robust OLS stay the same, but the t-values, F-values, and standard errors will differ. However, I don't care about those, only the estimate of the coefficients. It seems to follow that if I were to log the original equation

$ln(Y_i) = ln(a + bx_i) + ln(e_i), \ ln(e_i) \sim N(0, \sigma^2)$

and then minimize the following through an optimization function in R

$\sum(ln(Y_i) - ln(a + bx_i))^2$

then the results for the coefficients should match that of lm(y~x)$coefficients. I'm not seeing that. Here's my code so far.

library(dplyr)
library(wooldridge)

# Get the data ready.

data("saving")

saving <- saving %>% filter(sav > 0, 
                            inc < 20000, 
                            sav < inc)

x = saving$inc
y = saving$sav

# Define LinearLogError and generate coefficient estimates.

LinearLogError = function(coeffs){
  a = coeffs[1]; b = coeffs[2]
  yhat = log(a + b*x)
  return(sum((log(y) - yhat)^2))
}

lmCoeffs = lm(y~x)$coefficients

startCoeffs = c(1, 1)
optimCoeffs = optim(par = startCoeffs, fn = LinearLogError)$par

# Results.

lmCoeffs
optimCoeffs

However the results are

> lmCoeffs
(Intercept)           x 
316.1983535   0.1405155 
> optimCoeffs
[1] -237.0579080    0.1437663

So my question is am I understanding the solution correctly -- i.e. is my math correct? If yes, then what do I need to do in R to see similar results with lmCoeffs? In not, what don't I understand and what's the correct way to go about finding the proper coefficient estimates for my problem?

*Edited: Corrected a typo in my code.

Solution

You are optimizing different least squares so there is no reason to assume they should give you the same coefficients.

So quoting from your first post:

It's easy to verify now that 𝜈𝑖, the thing in square brackets, conditional on 𝑥, has mean zero and variance (𝑎𝑥𝑖+𝑏)2𝜎2. So, this multiplicative errors model is just a cleverly disguised linear model with heteroskedasticity.

This means a normal linear regression that assumes homoskedasticity (equal variance) doesn't hold . The second post you have, it shows another way to test you coefficients are not zero after running a normal linear regression.

If what you need are actually good estimates of your coefficients, you need to run a linear regression for unequal variances. It is definitely not what you have in the optimized function as you don't need to divide by yhat and I am not so sure how you ensure log(ax + b) is positive.

You can try the gls function in R together with specifying a variance structure as laid out in the quote above ( ax^2 + b) :

library(nlme)
vf <-varConstPower(form =~ inc)
fit<-gls(sav ~ inc,weights = vf, data = saving)
fit

Generalized least squares fit by REML
  Model: sav ~ inc 
  Data: saving 
  Log-restricted-likelihood: -641.6587

Coefficients:
(Intercept)         inc 
177.8608409   0.1557556