When estimating margins in Stata over different categories, AFAICS there are two different syntax options.
In the following example we first perform a simple regression on the GPA score. In the following we want to estimate the margins at different GRE values over the binary category of "admit":
use https://stats.idre.ucla.edu/stat/stata/dae/binary.dta, clear
reg gpa gre rank i.admit
margins admit, at(gre = (220 (100) 800))
margins, over(admit) at(gre = (220 (100) 800))
From what I understand the two last margins commands are synonymous. However, comparing the results shows some differences in the margins and standard errors! The differences are not big in the example data. However, they can become quite signficant as I saw in big data sets. Is there a plausible explanation to this?
The two margins
techniques assume different values of your covariate rank
. margins admit
fixes rank
at its global mean. margins, over(admit)
uses the admit
-specific averages of rank
.
The code below shows how you can replicate the margins
results by hand with a simple example:
use https://stats.idre.ucla.edu/stat/stata/dae/binary.dta, clear
regress gpa gre i.admit rank
* Make prediction using mean(rank)
sum rank
gen pred = _b[_cons] + _b[gre] * gre + _b[rank] * r(mean) + _b[1.admit] * admit
* Make prediction using mean(rank|admit)
bys admit: egen admit_specific_rank = mean(rank)
gen pred_for_over = _b[_cons] + _b[gre] * gre + _b[rank] * admit_specific_rank + _b[1.admit] * admit
* Predicted values match margins output:
margins admit, at(gre = 300)
tab admit if gre==300, sum(pred)
margins, over(admit) at(gre = 300)
tab admit if gre==300, sum(pred_for_over)