Search code examples
rexcelfill

How do I mimic the drag functionality for new rows such as in Excel but for R?


I am somewhat disturbed by the lead() and lag() which is not dragged dynamically/sequentially for each filled row.

My wish is to fill a new row based on an initial value and then sequentially fill next rows based on previously filled rows. In Excel this could be done by stating the formula/function in the cell and just drag it. How do I do it in R?

See below example

       x     y     z
   <dbl> <dbl> <dbl>
 1     1     1     1
 2     2     3    NA
 3     3     5    NA
 4     4     7    NA
 5     5     9    NA
 6     6    11    NA
 7     7    13    NA
 8     8    15    NA
 9     9    17    NA
10    10    19    NA 

The desired output is following this calculation where t-1 is the subscript for previous value: Z = Z_t-1 + X_t-1 - Y_t-1.

Desired output

       x     y     z
   <dbl> <dbl> <dbl>
 1     1     1     1
 2     2     3     1
 3     3     5     0
 4     4     7    -2
 5     5     9    -6
 6     6    11    -12
 7     7    13    -18
 8     8    15    -25
 9     9    17    -33
10    10    19    -42 

Solution

  • Note that the desired result in your question does not match the output of the formula you describe.

    In this specific case, you can get your answer by using cumsum (cumulative sum):

    cumsum(df1$x - df1$y + 1)
     [1]   1   1   0  -2  -5  -9 -14 -20 -27 -35
    

    However, in general, if you wish to apply a function recursively based on the output of the previous result, you need to write a loop to do it (or use a function that uses a loop "under the hood", as akrun shows with accumulate). A simple loop in base R that achieves your result would be:

    for(i in 2:nrow(df1)) df1$z[i] <- df1$z[i-1] + df1$x[i-1] - df1$y[i-1] 
    
    df
    #>     x  y   z
    #> 1   1  1   1
    #> 2   2  3   1
    #> 3   3  5   0
    #> 4   4  7  -2
    #> 5   5  9  -5
    #> 6   6 11  -9
    #> 7   7 13 -14
    #> 8   8 15 -20
    #> 9   9 17 -27
    #> 10 10 19 -35
    

    When you can use a function like cumsum that uses C-based vectorization rather than loops within R, it is likely to be a more efficient solution.