Search code examples
rdplyrlag

can dplyr::lag() leave values intact when no lag value exists


My data are grouped by mpg, and I want to update values in hp based on the prior hp value, but I don't want the first case, which lacks a prior value, to become NA. I'd like the first case to retain its value.

ds <- structure(list(mpg = c(10.4, 10.4, 15.2, 15.2, 19.2, 19.2, 21, 
21), hp = c(205, 215, 180, 150, 123, 175, 110, 110)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -8L), .Names = c("mpg", 
"hp"))

ds %>% 
  group_by(mpg) %>% 
  mutate(hp = lag(hp))

Failed soln: The default argument cannot be set to a variable value.

  ds %>% 
    group_by(mpg) %>% 
    mutate(hp = lag(hp, default = hp))

Solution

  • You need to subset the vector used as default- assuming lag of 1, use head(..., 1)

    ds %>% 
      group_by(mpg) %>% 
      mutate(hp = lag(hp, default = head(hp, 1)))