Search code examples
rtidyverselagdplyr

How to use lag/lead in mutate with an initial value?


Sample df:

library(tidyverse)

iris <- iris[1:10,]
iris$testlag <- NA
iris[[1,"testlag"]] <- 5

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species testlag
1           5.1         3.5          1.4         0.2  setosa       5
2           4.9         3.0          1.4         0.2  setosa      NA
3           4.7         3.2          1.3         0.2  setosa      NA
4           4.6         3.1          1.5         0.2  setosa      NA
5           5.0         3.6          1.4         0.2  setosa      NA
6           5.4         3.9          1.7         0.4  setosa      NA
7           4.6         3.4          1.4         0.3  setosa      NA
8           5.0         3.4          1.5         0.2  setosa      NA
9           4.4         2.9          1.4         0.2  setosa      NA
10          4.9         3.1          1.5         0.1  setosa      NA

In the testlag column, I'm interesting in using dplyr::lag() to retrieve the previous value and add some column, for example Petal.Length to it. As I have only one initial value, each subsequent calculation requires it to work iteratively, so I thought something like mutate would work.

I first tried doing something like this:

iris %>% mutate_at("testlag", ~ lag(.) + Petal.Length)

But this removed the first value, and only gave a valid value for the second row and NAs for the rest. Intuitively I know why it's removing the first value, but I thought the nature of mutate would allow it to work for the rest of the values, so I don't know how to fix that.

Of course using base R I could something like:

for (idx in 2:nrow(iris)) {
  iris[[idx, "testlag"]] <-
    lag(iris$testlag)[idx] + iris[[idx, "Petal.Length"]]
}

But I would prefer to implement this in tidyverse syntax.

Edit: Desired output (from my for loop)

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species testlag
1           5.1         3.5          1.4         0.2  setosa     5.0
2           4.9         3.0          1.4         0.2  setosa     6.4
3           4.7         3.2          1.3         0.2  setosa     7.7
4           4.6         3.1          1.5         0.2  setosa     9.2
5           5.0         3.6          1.4         0.2  setosa    10.6
6           5.4         3.9          1.7         0.4  setosa    12.3
7           4.6         3.4          1.4         0.3  setosa    13.7
8           5.0         3.4          1.5         0.2  setosa    15.2
9           4.4         2.9          1.4         0.2  setosa    16.6
10          4.9         3.1          1.5         0.1  setosa    18.1

Solution

  • Does this work for you?

    library(tidyverse)
    library("data.table")
    
    iris <- iris[1:10,]
    iris$testlag <- NA
    iris[[1,"testlag"]] <- 5
    
    iris %>% mutate (testlag = lag(first(testlag) + cumsum(Petal.Length)))
    

    Result:

       Sepal.Length Sepal.Width Petal.Length Petal.Width Species testlag
    1           5.1         3.5          1.4         0.2  setosa      NA
    2           4.9         3.0          1.4         0.2  setosa     6.4
    3           4.7         3.2          1.3         0.2  setosa     7.8
    4           4.6         3.1          1.5         0.2  setosa     9.1
    5           5.0         3.6          1.4         0.2  setosa    10.6
    6           5.4         3.9          1.7         0.4  setosa    12.0
    7           4.6         3.4          1.4         0.3  setosa    13.7
    8           5.0         3.4          1.5         0.2  setosa    15.1
    9           4.4         2.9          1.4         0.2  setosa    16.6
    10          4.9         3.1          1.5         0.1  setosa    18.0
    

    Since technically there is no N-1 Petal length when N = 1, I left the first value of testlag NA. Do you really need it to be initial value? If you need, this will work:

    iris %>% mutate (testlag = lag(first(testlag) + cumsum(Petal.Length), default=first(testlag)))