Search code examples
rlagfinancepanel-dataperformanceanalytics

Creating lagged (t-1) independent variables in Panel data


Say that I want to regress the predictive model: Return_t = x + Volume_t-1 + Volatility_t-1 + e. I have a 5-year weekly panel data with 28 companies already prepared in excel and looks like this:

ID  Date        Return      Volume       Volatility
1   2012-01-10  0.039441572 0.6979594    0.2606079
1   2012-01-17 -0.021107681 0.6447289    0.3741519
1   2012-01-24  0.004798082 1.0072677    0.3097104
1   2012-01-31  0.001559987 1.0066153    0.2761096
1   2012-02-07 -0.009058289 0.7218983    0.2592109
1   2012-02-14  0.046404936 1.2879986    0.4304542
2   2012-01-10  0.02073912 -0.141970906  0.2573633
2   2012-01-17 -0.00369127  0.007792180  0.3360240
2   2012-01-24 -0.05881038  0.001347634  0.2163933
2   2012-01-31 -0.05664598  0.640085029  0.3545598
2   2012-02-07  0.03654193  0.360513703  0.3594383
2   2012-02-14  0.03092432  0.105669775  0.3043643

I want to lag the independent variables setting it to t-1, which package allows me to do that in R? I am going to run a panel data regression with fixed effects.


Solution

  • After grouping by 'ID', we can use lag from dplyr

    library(dplyr)
    df1 %>% 
      group_by(ID) %>%
      mutate(Volume_1 = lag(Volume), Volatility_1 = lag(Volatility))
    

    Or another option is shift from data.table

    library(data.table)
    nm1 <- c("Volume", "Volatility")
    setDT(df1)[, paste0(nm1, "_1") := lapply(.SD, shift), by = ID, .SDcols = nm1]