Search code examples
rdataframelag

IF THEN on a Dataframe in r with LAG


I have a dataframe with multiple columns, but two columns in particular are interesting for me. Column1 contains values 0 and a number (>0) Column2 contains numbers as well.

I want to create 21 new columns containing new information from Column2 given Column1.

So when Column1 is positive (not 0) I want the first new column, Column01, to take the value from Column2 that goes 10 back. and Column02 goes 9 back,.. Column11 is the exact same as Column2 value.. and Column21 is 10 forward.

For example

  Column 1  Column2   Columns01 Columns02.. Columns11..Columns20 Columns21
      0        5          0         0           0          0         0
      0        2          0         0           0          0         0 
      0        0          0         0           0          0         0  
      1        3          0         0           3          5         4
      0        10         0         0           0          0         0
      0        83         0         0           0          0         0
      0        2          0         0           0          0         0
      0        5          0         0           0          0         0
      0        4          0         0           0          0         0
      1        8          0         5           8          5         3
      0        6          0         0           0          0         0
      0        5          0         0           0          0         0
      0        55         0         0           0          0         0
      0        4          0         0           0          0         0
      2        3          10       83           3          5         0
      0        2          0         0           0          0         0
      0        3          0         0           0          0         0
      0        4          0         0           0          0         0
      0        5          0         0           0          0         0
      0        3          0         0           0          0         0
      1        22         6         5          22          0         0
      0        12         0         0           0          0         0
      0        0          0         0           0          0         0
      0        5          0         0           0          0         0

Hope this makes sense to you and you can help.


Solution

  • Here's one way using the newly implemented shift() function from data.table v1.9.5:

    require(data.table) ## v1.9.5+
    setDT(dat)                                                      ## (1)
    cols = paste0("cols", sprintf("%.2d", 1:21))                    ## (2)
    dat[, cols[1:10] := shift(Column2, 10:1, fill=0)]               ## (3)
    dat[, cols[11] := Column2]                                      ## (4)
    dat[, cols[12:21] := shift(Column2, 1:10, fill=0, type="lead")] ## (5)
    dat[Column1 == 0, (cols) := 0]                                  ## (6)
    
    1. Assuming dat is your data.frame, setDT(dat) converts it to a data.table, by reference (the data is not copied physically to a new location in memory, for efficiency).

    2. Generate all the column names.

    3. Generated lagged vectors of Column2 with periods 10:1 and assign it to the first 10 columns.

    4. 11th column is = Column2.

    5. Generated leading vectors of Column2 with periods 1:10 and assign it to the last 10 columns.

    6. Get indices of all the rows where Column1 == 0, and replace/reset all newly generated columns for those indices to 0.

    Use setDF(dat) if you want a data.frame back.

    You can wrap this in a function with the values -10:10 and choosing type="lag" or type="lead" accordingly, depending on whether the values are negative or positive.. I'll leave that to you.