Search code examples
rstatisticsleast-squaresnonlinear-optimization

Matching lm and optim coefficient estimates for linear model with multiplicative error


This question is a mix of math and programming, but I'm guessing the solution lay on the programming side of things.

Suppose I have a linear model with a multiplicative error.

I'd like to estimate my coefficients a and b in R. I've found the solution in the top answer here and the proof seems to make sense. I've also found out how to do OLS with heteroskedasticity-robust standard errors here. My interpretation of the results between the two resources is that the estimated values of the coefficients in both plain-Jane OLS and heteroskedastically-robust OLS stay the same, but the t-values, F-values, and standard errors will differ. However, I don't care about those, only the estimate of the coefficients. It seems to follow that if I were to log the original equation

and then minimize the following through an optimization function in R

then the results for the coefficients should match that of lm(y~x)$coefficients. I'm not seeing that. Here's my code so far.

library(dplyr)
library(wooldridge)

# Get the data ready.

data("saving")

saving <- saving %>% filter(sav > 0, 
                            inc < 20000, 
                            sav < inc)

x = saving$inc
y = saving$sav

# Define LinearLogError and generate coefficient estimates.

LinearLogError = function(coeffs){
  a = coeffs[1]; b = coeffs[2]
  yhat = log(a + b*x)
  return(sum((log(y) - yhat)^2))
}

lmCoeffs = lm(y~x)$coefficients

startCoeffs = c(1, 1)
optimCoeffs = optim(par = startCoeffs, fn = LinearLogError)$par

# Results.

lmCoeffs
optimCoeffs

However the results are

> lmCoeffs
(Intercept)           x 
316.1983535   0.1405155 
> optimCoeffs
[1] -237.0579080    0.1437663

So my question is am I understanding the solution correctly -- i.e. is my math correct? If yes, then what do I need to do in R to see similar results with lmCoeffs? In not, what don't I understand and what's the correct way to go about finding the proper coefficient estimates for my problem?

*Edited: Corrected a typo in my code.


Solution

  • You are optimizing different least squares so there is no reason to assume they should give you the same coefficients.

    So quoting from your first post:

    It's easy to verify now that 𝜈𝑖, the thing in square brackets, conditional on 𝑥, has mean zero and variance (𝑎𝑥𝑖+𝑏)2𝜎2. So, this multiplicative errors model is just a cleverly disguised linear model with heteroskedasticity.

    This means a normal linear regression that assumes homoskedasticity (equal variance) doesn't hold . The second post you have, it shows another way to test you coefficients are not zero after running a normal linear regression.

    If what you need are actually good estimates of your coefficients, you need to run a linear regression for unequal variances. It is definitely not what you have in the optimized function as you don't need to divide by yhat and I am not so sure how you ensure log(ax + b) is positive.

    You can try the gls function in R together with specifying a variance structure as laid out in the quote above ( ax^2 + b) :

    library(nlme)
    vf <-varConstPower(form =~ inc)
    fit<-gls(sav ~ inc,weights = vf, data = saving)
    fit
    
    Generalized least squares fit by REML
      Model: sav ~ inc 
      Data: saving 
      Log-restricted-likelihood: -641.6587
    
    Coefficients:
    (Intercept)         inc 
    177.8608409   0.1557556