This question is a mix of math and programming, but I'm guessing the solution lay on the programming side of things.
Suppose I have a linear model with a multiplicative error.
I'd like to estimate my coefficients a and b in R. I've found the solution in the top answer here and the proof seems to make sense. I've also found out how to do OLS with heteroskedasticity-robust standard errors here. My interpretation of the results between the two resources is that the estimated values of the coefficients in both plain-Jane OLS and heteroskedastically-robust OLS stay the same, but the t-values, F-values, and standard errors will differ. However, I don't care about those, only the estimate of the coefficients. It seems to follow that if I were to log the original equation
and then minimize the following through an optimization function in R
then the results for the coefficients should match that of lm(y~x)$coefficients
. I'm not seeing that. Here's my code so far.
library(dplyr)
library(wooldridge)
# Get the data ready.
data("saving")
saving <- saving %>% filter(sav > 0,
inc < 20000,
sav < inc)
x = saving$inc
y = saving$sav
# Define LinearLogError and generate coefficient estimates.
LinearLogError = function(coeffs){
a = coeffs[1]; b = coeffs[2]
yhat = log(a + b*x)
return(sum((log(y) - yhat)^2))
}
lmCoeffs = lm(y~x)$coefficients
startCoeffs = c(1, 1)
optimCoeffs = optim(par = startCoeffs, fn = LinearLogError)$par
# Results.
lmCoeffs
optimCoeffs
However the results are
> lmCoeffs
(Intercept) x
316.1983535 0.1405155
> optimCoeffs
[1] -237.0579080 0.1437663
So my question is am I understanding the solution correctly -- i.e. is my math correct? If yes, then what do I need to do in R to see similar results with lmCoeffs
? In not, what don't I understand and what's the correct way to go about finding the proper coefficient estimates for my problem?
*Edited: Corrected a typo in my code.
You are optimizing different least squares so there is no reason to assume they should give you the same coefficients.
So quoting from your first post:
It's easy to verify now that 𝜈𝑖, the thing in square brackets, conditional on 𝑥, has mean zero and variance (𝑎𝑥𝑖+𝑏)2𝜎2. So, this multiplicative errors model is just a cleverly disguised linear model with heteroskedasticity.
This means a normal linear regression that assumes homoskedasticity (equal variance) doesn't hold . The second post you have, it shows another way to test you coefficients are not zero after running a normal linear regression.
If what you need are actually good estimates of your coefficients, you need to run a linear regression for unequal variances. It is definitely not what you have in the optimized function as you don't need to divide by yhat and I am not so sure how you ensure log(ax + b) is positive.
You can try the gls
function in R together with specifying a variance structure as laid out in the quote above ( ax^2 + b) :
library(nlme)
vf <-varConstPower(form =~ inc)
fit<-gls(sav ~ inc,weights = vf, data = saving)
fit
Generalized least squares fit by REML
Model: sav ~ inc
Data: saving
Log-restricted-likelihood: -641.6587
Coefficients:
(Intercept) inc
177.8608409 0.1557556