Search code examples
statamargins

Margins after seemingly unrelated regression too slow in Stata


I have a 3 million obs data set. I need to estimate a LPM with SUR, and get the marginal effects.

I used gsem... vce(cluster x), then margins, ... force. But it takes a very long time to get the margins result (more than 2 hours). I do need to standard errors for CI, so I can't not use the nose option.

Is there other ways I can improve the speed?


Solution

  • Exact code depends on which marginal effects you mean exactly. You can calculate partial effects with lincom, which will most likely by faster than margins.

    As an example, suppose we estimate this model: model

    The partial effect of x1 on y can be obtained by taking the partial derivative with respect to x1: derivative

    We can get the effect of x1 on y at the means of x2 and x3 by plugging in the means. To do this in Stata:

    // Get data
    webuse regress
    
    // Run the regression
    qui reg y c.x1##c.(x2 x3)
    
    // Get the sample means of x2 and x3 
    sum x2 if e(sample), meanonly
    scalar m_x2 = r(mean)
    sum x3 if e(sample), meanonly
    scalar m_x3 = r(mean)
    
    // Calculate partial effect
    lincom x1 + m_x2 * c.x1#c.x2 + m_x3*c.x1#c.x3
    

    Result:

    . lincom x1 + m_x2 * c.x1#c.x2 + m_x3*c.x1#c.x3
    
     ( 1)  x1 - .2972973*c.x1#c.x2 + 3019.459*c.x1#c.x3 = 0
    
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             (1) |   1.409372   1.005254     1.40   0.163    -.5778255    3.396569
    ------------------------------------------------------------------------------
    

    As you can see, this is the same as the results obtained by margins:

    . qui reg y c.x1##c.(x2 x3)
    
    . margins, dydx(x1) atmeans
    
    Conditional marginal effects                    Number of obs     =        148
    Model VCE    : OLS
    
    Expression   : Linear prediction, predict()
    dy/dx w.r.t. : x1
    at           : x1              =    3.014865 (mean)
                   x2              =   -.2972973 (mean)
                   x3              =    3019.459 (mean)
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
              x1 |   1.409372   1.005254     1.40   0.163    -.5778256    3.396569
    ------------------------------------------------------------------------------
    

    Here's a speed comparison showing that lincom is 14 times faster than margins in this case with 3 million observations:

    clear
    webuse regress
    expand 20271
    
    gen lincom = .
    gen margins = .
    qui reg y c.x1##c.(x2 x3)
    
    forval i = 1/50 {
    
        timer clear
        
        timer on 1
        sum x2 if e(sample), meanonly
        scalar m_x2 = r(mean)
        sum x3 if e(sample), meanonly
        scalar m_x3 = r(mean)
        lincom x1 + m_x2 * c.x1#c.x2 + m_x3*c.x1#c.x3
        timer off 1
    
        timer on 2
        margins, dydx(x1) atmeans
        timer off 2
        
        timer list
        replace lincom = r(t1) in `i'
        replace margins = r(t2) in `i'
    }
    
    ttest lincom == margins
    di "On average, lincom is " %4.2f `=r(mu_2) / r(mu_1)' " times faster than margins with `=_N' observations"
    // On average, lincom is 13.88 times faster than margins with 3000108 observations