Search code examples
rstatalagpanel-dataplm

R plm lag - what is the equivalent to L1.x in Stata?


Using the plm package in R to fit a fixed-effects model, what is the correct syntax to add a lagged variable to the model? Similar to the 'L1.variable' command in Stata.

Here is my attempt adding a lagged variable (this is a test model and it might not make sense):

library(foreign)
nlswork <- read.dta("http://www.stata-press.com/data/r11/nlswork.dta")
pnlswork <- plm.data(nlswork, c('idcode', 'year'))
ffe <- plm(ln_wage ~ ttl_exp+lag(wks_work,1)
           , model = 'within'
           , data = nlswork)
summary(ffe)

R output:

Oneway (individual) effect Within Model

Call:
plm(formula = ln_wage ~ ttl_exp + lag(wks_work), data = nlswork, 
    model = "within")

Unbalanced Panel: n=3911, T=1-14, N=19619

Residuals :
    Min.  1st Qu.   Median  3rd Qu.     Max. 
-1.77000 -0.10100  0.00293  0.11000  2.90000 

Coefficients :
                Estimate Std. Error t-value  Pr(>|t|)    
ttl_exp       0.02341057 0.00073832 31.7078 < 2.2e-16 ***
lag(wks_work) 0.00081576 0.00010628  7.6755 1.744e-14 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    1296.9
Residual Sum of Squares: 1126.9
R-Squared:      0.13105
Adj. R-Squared: -0.085379
F-statistic: 1184.39 on 2 and 15706 DF, p-value: < 2.22e-16

However, I got different results compared what Stata produces.

In my actual model, I would like to instrument an endogenous variable with its lagged value.

Thanks!

For reference, here is the Stata code:

webuse nlswork.dta
xtset idcode year
xtreg ln_wage ttl_exp L1.wks_work, fe

Stata output:

Fixed-effects (within) regression               Number of obs     =     10,680
Group variable: idcode                          Number of groups  =      3,671

R-sq:                                           Obs per group:
     within  = 0.1492                                         min =          1
     between = 0.2063                                         avg =        2.9
     overall = 0.1483                                         max =          8

                                                F(2,7007)         =     614.60
corr(u_i, Xb)  = 0.1329                         Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     ttl_exp |   .0192578   .0012233    15.74   0.000     .0168597    .0216558
             |
    wks_work |
         L1. |   .0015891   .0001957     8.12   0.000     .0012054    .0019728
             |
       _cons |   1.502879   .0075431   199.24   0.000     1.488092    1.517666
-------------+----------------------------------------------------------------
     sigma_u |  .40678942
     sigma_e |  .28124886
         rho |  .67658275   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(3670, 7007) = 4.71                  Prob > F = 0.0000

Solution

  • lag() as it is in plm lags the observations row-wise without "looking" at the time variable, i.e. it shifts the variable (per individual). If there are gaps in the time dimension, you probably want to take the value of the time variable into account. There is the (as of now) unexported function plm:::lagt.pseries which takes the time variable into account and hence handles gaps in data as you might expect.

    Edit: Since plm version 1.7-0, default behaviour of lag in plm is to shift time-wise but one can control behaviour by argument shift(shift = c("time", "row")) to shift either time-wise or row-wise (old behaviour).

    Use it as follows:

    library(plm)
    library(foreign)
    nlswork <- read.dta("http://www.stata-press.com/data/r11/nlswork.dta")
    pnlswork <- pdata.frame(nlswork, c('idcode', 'year'))
    ffe <- plm(ln_wage ~ ttl_exp + plm:::lagt.pseries(wks_work,1)
               , model = 'within'
               , data = pnlswork)
    summary(ffe)
    
    Oneway (individual) effect Within Model
    
    Call:
    plm(formula = ln_wage ~ ttl_exp + plm:::lagt.pseries(wks_work, 
        1), data = nlswork, model = "within")
    
    Unbalanced Panel: n=3671, T=1-8, N=10680
    
    Residuals :
       Min. 1st Qu.  Median 3rd Qu.    Max. 
    -1.5900 -0.0859  0.0000  0.0957  2.5600 
    
    Coefficients :
                                      Estimate Std. Error t-value  Pr(>|t|)    
    ttl_exp                         0.01925775 0.00122330 15.7425 < 2.2e-16 ***
    plm:::lagt.pseries(wks_work, 1) 0.00158907 0.00019573  8.1186 5.525e-16 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    Total Sum of Squares:    651.49
    Residual Sum of Squares: 554.26
    R-Squared:      0.14924
    Adj. R-Squared: -0.29659
    F-statistic: 614.604 on 2 and 7007 DF, p-value: < 2.22e-16
    

    Btw1: Better use pdata.frame() instead of plm.data(). Btw2: You can check for gaps in your data with plm's is.pconsecutive():

    is.pconsecutive(pnlswork)
    all(is.pconsecutive(pnlswork))
    

    You can also make the data consecutive first and then use lag(), like this:

    pnlswork2 <- make.pconsecutive(pnlswork)
    pnlswork2$wks_work_lag <- lag(pnlswork2$wks_work)
    ffe2 <- plm(ln_wage ~ ttl_exp + wks_work_lag
               , model = 'within'
               , data = pnlswork2)
    summary(ffe2)
    
    Oneway (individual) effect Within Model
    
    Call:
    plm(formula = ln_wage ~ ttl_exp + wks_work_lag, data = pnlswork2, 
        model = "within")
    
    Unbalanced Panel: n=3671, T=1-8, N=10680
    
    Residuals :
       Min. 1st Qu.  Median 3rd Qu.    Max. 
    -1.5900 -0.0859  0.0000  0.0957  2.5600 
    
    Coefficients :
                   Estimate Std. Error t-value  Pr(>|t|)    
    ttl_exp      0.01925775 0.00122330 15.7425 < 2.2e-16 ***
    wks_work_lag 0.00158907 0.00019573  8.1186 5.525e-16 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    Total Sum of Squares:    651.49
    Residual Sum of Squares: 554.26
    R-Squared:      0.14924
    Adj. R-Squared: -0.29659
    F-statistic: 614.604 on 2 and 7007 DF, p-value: < 2.22e-16
    

    Or simply:

    ffe3 <- plm(ln_wage ~ ttl_exp + lag(wks_work)
                , model = 'within'
                , data = pnlswork2) # note: it is the consecutive panel data set here
    summary(ffe3)
    
    Oneway (individual) effect Within Model
    
    Call:
    plm(formula = ln_wage ~ ttl_exp + lag(wks_work), data = pnlswork2, 
        model = "within")
    
    Unbalanced Panel: n=3671, T=1-8, N=10680
    
    Residuals :
       Min. 1st Qu.  Median 3rd Qu.    Max. 
    -1.5900 -0.0859  0.0000  0.0957  2.5600 
    
    Coefficients :
                    Estimate Std. Error t-value  Pr(>|t|)    
    ttl_exp       0.01925775 0.00122330 15.7425 < 2.2e-16 ***
    lag(wks_work) 0.00158907 0.00019573  8.1186 5.525e-16 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    Total Sum of Squares:    651.49
    Residual Sum of Squares: 554.26
    R-Squared:      0.14924
    Adj. R-Squared: -0.29659
    F-statistic: 614.604 on 2 and 7007 DF, p-value: < 2.22e-16