Search code examples
rdplyrnalag

replace NA with original value when using lag() function in R


I am using dplyr's lag() function and I am trying to figure out not make NA (but take the original value instead) as the default value for the blank lagged cells.

Here is my code:

df <- data_frame(d1 = runif(10, 1, 5), 
                 d2 = runif(10, 2, 6),
                 d3 = runif(10, 3, 7),
                 d4 = runif(10, 4, 8),
                 d5 = runif(10, 5, 9),
                 d6 = runif(10, 6, 10),
                 d7 = runif(10, 7, 11),
                 d8 = runif(10, 8, 12)) %>% rownames_to_column() 
df %>%
  gather(key = "col", value = "val", -"rowname") %>%
  group_by(col) %>%
  mutate(new_col = ifelse(val >= lag(val, 2) + lag(val, 2)*0.4, NA, val))

It doesn't work if I do this code (which, honestly, I quite expect):

df %>%
      gather(key = "col", value = "val", -"rowname") %>%
      group_by(col) %>%
      mutate(new_col = if_else(val >= lag(val, 2, default = val) + lag(val, 2, default = val)*0.4, NA, val))

What am I missing so that I can arrive to this result?

   rowname col     val new_col
   <chr>   <chr> <dbl>   <dbl>
 1 1       d1     1.31   **1.31**   
 2 2       d1     4.10   **4.10**   
 3 3       d1     3.81   NA   
 4 4       d1     4.52    4.52
 5 5       d1     3.89    3.89
 6 6       d1     1.01    1.01
 7 7       d1     2.68    2.68
 8 8       d1     2.81   NA   
 9 9       d1     1.18    1.18
10 10      d1     1.19    1.19
# ... with 70 more rows

Appreciate any help!


Solution

  • You could replace the n lagged values with the original values.

    library(dplyr)
    n <- 2
    
    df %>%
     tidyr::pivot_longer(cols = -rowname, values_to = 'val', names_to = 'col') %>%
     group_by(col) %>%
     mutate(new_col = if_else(val >= lag(val, n) + lag(val, n)*0.4, NA_real_, val),
            new_col  = replace(new_col, 1:n, val[1:n]))