Consider a very basic estimation command, regress
. In the manual, under Methods and Formulas, we read:
So, according to the manual, for fweights
, Stata is taking my vector of weights (inputted with fw=
), and creating a diagonal matrix D
. Now, diagonal matrices have the same transpose. Therefore, we could define D=C'C=C^2
, where C
is a matrix containing the square root of my weights in the diagonal.
Now, given my notation and the text above, we can reproduce Stata's method by premultiplying both X
and y
(and Z
) with the matrix C
. This way, (CX)'CX=X'C'CX=X'DX
, and so on. In practice we achieve this by multiplying each variable with the square root of the weight, observation by observation.
Now, I tried to replicate Stata's estimates manually, but I get a different result. Example code below:
webuse auto, clear
keep if !mi(rep78)
qui regress price weight length [fw=rep78]
estimates store stata
preserve
replace price = price*sqrt(rep78)
replace weight = weight*sqrt(rep78)
replace length = length*sqrt(rep78)
qui regress price weight length
estimates store me
restore
estimates table stata me, b
With output:
----------------------------------------
Variable | stata me
-------------+--------------------------
weight | 4.1339379 1.7738167
length | -82.996394 16.502356
_cons | 9425.5443 -4071.7341
----------------------------------------
The match is terrible. Results are the same if we change fw=
with other forms of weights. Whats the issue? Is my math or code wrong? If not, how is Stata actually implementing the weights?
This mismatch happens because you forgot to scale the intercept/constant:
webuse auto, clear
keep if !mi(rep78)
qui regress price weight length [fw=rep78]
estimates store stata
preserve
replace price = price*sqrt(rep78)
replace weight = weight*sqrt(rep78)
replace length = length*sqrt(rep78)
gen constant = 1*sqrt(rep78)
qui regress price weight length constant, nocons
estimates store me
restore
estimates table stata me, b
This replaces the usual column of ones with a "pseudo-intercept" that is the weighted version.