Search code examples
statacategorical-datamargins

Difference in Stata's margins command using distinct grouping syntax


When estimating margins in Stata over different categories, AFAICS there are two different syntax options.

In the following example we first perform a simple regression on the GPA score. In the following we want to estimate the margins at different GRE values over the binary category of "admit":

use https://stats.idre.ucla.edu/stat/stata/dae/binary.dta, clear
reg gpa gre rank i.admit
margins admit, at(gre = (220 (100) 800))
margins, over(admit) at(gre = (220 (100) 800))

From what I understand the two last margins commands are synonymous. However, comparing the results shows some differences in the margins and standard errors! The differences are not big in the example data. However, they can become quite signficant as I saw in big data sets. Is there a plausible explanation to this?


Solution

  • The two margins techniques assume different values of your covariate rank. margins admit fixes rank at its global mean. margins, over(admit) uses the admit-specific averages of rank.

    The code below shows how you can replicate the margins results by hand with a simple example:

    use https://stats.idre.ucla.edu/stat/stata/dae/binary.dta, clear
    
    regress gpa gre i.admit rank
    
    * Make prediction using mean(rank)
    sum rank 
    gen pred          = _b[_cons] + _b[gre] * gre + _b[rank] * r(mean) + _b[1.admit] * admit 
    
    * Make prediction using mean(rank|admit)
    bys admit: egen admit_specific_rank = mean(rank)
    gen pred_for_over = _b[_cons] + _b[gre] * gre + _b[rank] * admit_specific_rank + _b[1.admit] * admit 
    
    * Predicted values match margins output:
    margins admit,       at(gre = 300) 
    tab admit if gre==300, sum(pred)
    
    margins, over(admit) at(gre = 300) 
    tab admit if gre==300, sum(pred_for_over)