Search code examples
stata

How is Stata implementing weights?


Consider a very basic estimation command, regress. In the manual, under Methods and Formulas, we read:

enter image description here

So, according to the manual, for fweights, Stata is taking my vector of weights (inputted with fw=), and creating a diagonal matrix D. Now, diagonal matrices have the same transpose. Therefore, we could define D=C'C=C^2, where C is a matrix containing the square root of my weights in the diagonal.

Now, given my notation and the text above, we can reproduce Stata's method by premultiplying both X and y (and Z) with the matrix C. This way, (CX)'CX=X'C'CX=X'DX, and so on. In practice we achieve this by multiplying each variable with the square root of the weight, observation by observation.

Now, I tried to replicate Stata's estimates manually, but I get a different result. Example code below:

webuse auto, clear

keep if !mi(rep78)
qui regress price weight length [fw=rep78]
estimates store stata
preserve
replace price = price*sqrt(rep78)
replace weight = weight*sqrt(rep78)
replace length = length*sqrt(rep78)

qui regress price weight length
estimates store me
restore
estimates table stata me, b

With output:

----------------------------------------
    Variable |   stata          me      
-------------+--------------------------
      weight |  4.1339379    1.7738167  
      length | -82.996394    16.502356  
       _cons |  9425.5443   -4071.7341  
----------------------------------------

The match is terrible. Results are the same if we change fw= with other forms of weights. Whats the issue? Is my math or code wrong? If not, how is Stata actually implementing the weights?


Solution

  • This mismatch happens because you forgot to scale the intercept/constant:

    webuse auto, clear
    keep if !mi(rep78)
    qui regress price weight length [fw=rep78]
    estimates store stata
    preserve
    replace price = price*sqrt(rep78)
    replace weight = weight*sqrt(rep78)
    replace length = length*sqrt(rep78)
    gen constant = 1*sqrt(rep78)
    qui regress price weight length constant, nocons
    estimates store me
    restore
    estimates table stata me, b
    

    This replaces the usual column of ones with a "pseudo-intercept" that is the weighted version.