Search code examples
r

Create multiple lagged variables with different offsets


I have a data frame with a column 't'. I want to create several columns that are the 't' column lagged n times. The new columns should have names like which indicates the number of steps they have been lagged, e.g. "t-1", "t-2" et.c.

  year      t  t-1 t-2
19620101    1   NA  NA
19630102    2   1   NA
19640103    3   2   1
19650104    4   3   2
19650104    5   4   3
19650104    6   5   4

My idea is that I will do it in four steps:

  • A loop for the column names using "paste"
  • A loop for the temporary dataframes for lagged columns using "paste"
  • A loop for creating the lagged columns
  • cbind them.

But I am not able to proceed with the code. Something rough:

df_final <- lagged(df = "odd", n = 3)

lagged <- function(df, n){
   df <- zoo(df)
   lags <- paste("A", 1:n, sep ="_")
   for (i in 1:5) {
     odd <- as.data.frame(lag(odd$OBS_Q,-1 * i, na.pad = TRUE))

   #Cbind here
   } 

I am stuck in writing this function. Could you please show some way? Or a different simpler way of doing this.

Reference: Basic lag in R vector/dataframe


Addendum:

Real data:

x<-structure(list(DATE = 19630101:19630104, PRECIP = c(0, 0, 0,0), 
               OBS_Q = c(1.61, 1.48, 1.4, 1.33), swb = c(1.75, 1.73, 1.7,1.67), 
               gr4j = c(1.9, 1.77, 1.67, 1.58), isba = c(0.83, 0.83,0.83, 0.83), 
               noah = c(1.31, 1.19, 1.24, 1.31), sac = c(1.99,1.8, 1.66, 1.57), 
               swap = c(1.1, 1.05, 1.08, 0.99), vic.mm.day. = c(2.1,1.75, 1.55, 1.43)), 
          .Names = c("DATE", "PRECIP", "OBS_Q", "swb","gr4j", "isba", "noah", "sac", "swap", "vic.mm.day."), 
          class = c("data.table","data.frame"), row.names = c(NA, -4L))

The column to be lagged is OBS_Q.


Solution

  • If you are looking for efficiency, try data.tables new shift function

    library(data.table) # V >= 1.9.5
    n <- 2
    setDT(df)[, paste("t", 1:n) := shift(t, 1:n)][]
    #    t t 1 t 2
    # 1: 1  NA  NA
    # 2: 2   1  NA
    # 3: 3   2   1
    # 4: 4   3   2
    # 5: 5   4   3
    # 6: 6   5   4 
    

    Here you can set any name for your new columns (within paste) and you also don't need to bind this back to the original as this updates your data set by reference using the := operator.