Search code examples
rplm

Create lag, lead and diff variables in plm dataframe


I am attempting to do some panel analysis using lagged, leading and differenced variables. However the plm functions do not provide the desired results as it does not loop over individuals. I have looked online, however the following post (Answer_Stack), using pdata.frame() gave the same problematic results. When i group_by(i) in dplyr i get the desired result. Can anyone explain what is going on?

# Variables
i <- c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7)
t <- c(2001, 2002, 2003, 2001, 2002, 2003, 2001, 2002, 2003, 2001, 2002, 2003, 2001, 2002, 2003, 2001, 2002, 2003, 2001, 2002, 2003)
y <- c(0.047136, 0.044581, 0.040973, 0.045536, 0.043952, 0.038797, 0.049942, 0.047440, 0.042193, 0.048503, 0.046816, 0.040292, 0.056089, 0.052054, 0.047078, 0.044223, 0.041516, 0.036947, 0.045608, 0.042028, 0.037878)
x <- c(0.32691, 0.33013, 0.32888, 0.40301, 0.40337, 0.40326, 0.29692, 0.29982, 0.29790, 0.30380, 0.30698, 0.30668, 0.27942, 0.28696, 0.28616, 0.31218, 0.31424, 0.31382, 0.34592, 0.34738, 0.34782)

# Create plm dataframe
dta <- data.frame(i, t, y, x)
pdta <- plm.data(dta, indexes = c("i", "t"))

# Create lagged variable with plm
pdta$l.x <- lag(pdta$x)             # Does not work

# Create using dplyr
pdta <- pdta %>%
  group_by(i) %>%
  mutate(lag.x = lag(x))

View(pdta)

Note to answer: Even after following the steps suggested, i get this:

> pdta <- pdata.frame(dta, index= c("i", "t"))
    > head(cbind(pdta$i, pdta$y, lag(pdta$y)), 10)
           [,1]     [,2]     [,3]
    1-2001    1 0.047136       NA
    1-2002    1 0.044581 0.047136
    1-2003    1 0.040973 0.044581
    2-2001    2 0.045536 0.040973
    2-2002    2 0.043952 0.045536
    2-2003    2 0.038797 0.043952
    3-2001    3 0.049942 0.038797
    3-2002    3 0.047440 0.049942
    3-2003    3 0.042193 0.047440
    4-2001    4 0.048503 0.042193

Solution

  • For one thing, you are not using the right function to convert the data.frame into a pdata.frame. plm.data returns a data.frame which can be directly used in the estimator functions, but is not directly amenable to the data transformation functions. Use pdata.frame instead:

    pdta <- pdata.frame(dta, index= c("i", "t"))
    

    Then give lag a try:

    head(cbind(pdta$i, pdta$y, lag(pdta$y)), 10)
           [,1]     [,2]     [,3]
    1-2001    1 0.047136       NA
    1-2002    1 0.044581 0.047136
    1-2003    1 0.040973 0.044581
    2-2001    2 0.045536       NA
    2-2002    2 0.043952 0.045536
    2-2003    2 0.038797 0.043952
    3-2001    3 0.049942       NA
    3-2002    3 0.047440 0.049942
    3-2003    3 0.042193 0.047440
    4-2001    4 0.048503       NA
    

    Alternatively, you can also do

    pdta <- pdata.frame(plm.data(dta, indexes= c("i", "t")))
    

    Second, as we figured out in the comments, loading dplyr "overwrites" (masks) a number of base R functions including the ones you mention. plm builds on these base R functions to perform the desired operations on its pdata.frame objects. As helix123 mentions, even with dplyr loaded, you can refer to the plm implementation using plm::<function name>.