Search code examples
rdplyrlag

Populating a new column based the value of the prior value of the newly created column


I have an existing dataframe with 2 columns. I want to create a third column (s) based on these specifications:

Current value of s = prior value of s + [prior value of s * current value of X2]

I have managed to get some calculations as such which populates s based on other two columns as a test. Of course this does not do calculations on its own prior value.

library(dplyr)
set.seed(123)
a <- data.frame(replicate(2,sample(1:10,10,rep = TRUE)))
a %>% mutate(s = lag(X1, default = 1) +lag(X1, default = 1) * X2)

   X1 X2   s
1   3 10  11
2   8  5  18
3   5  7  64
4   9  6  35
5  10  2  27
6   1  9 100
7   6  3   4
8   9  1  12
9   6  4  45
10  5 10  66

Ideally I would need something like this :

library(dplyr)
set.seed(123)
a <- data.frame(replicate(2,sample(1:10,10,rep = TRUE)))
a %>% mutate(s = lag(s, default = 1) +lag(s, default = 1) * X2)

   X1 X2   s
1   3 10  11
2   8  5  66
3   5  7  528
4   9  6  3696

But this causes an issue as s is not defined.

Any advise is much appreciated.

Regards, Aksel


Solution

  • Not a bad time to use a for loop then

    a$s <- 0
    for (i in 1:nrow(a)) {
      if (i == 1) 
        a$s[i] = a$X2[i] + 1
      else
        a$s[i] = a$s[i-1] + (a$s[i-1] * a$X2[i])
    }
    
    
    a
    #   X1 X2        s
    #1   3 10       11
    #2   8  5       66
    #3   5  7      528
    #4   9  6     3696
    #5  10  2    11088
    #6   1  9   110880
    #7   6  3   443520
    #8   9  1   887040
    #9   6  4  4435200
    #10  5 10 48787200