Search code examples
rdplyrlag

Can not fill 2 concecutive rows using dplyr lag function


I use R and R package dplyr. I want in data frame fill missing values from previous row value (increase it by 1). It works well if I have a gap of one, but if there are 2 or more consecutive missing values then I have a problem. Let say here I have a data frame and for some rows it values are missing.

  id val1 val2
1  0    0    0
2  1    1    1
3  2   NA    NA
4  3    3    3
5  4   NA   NA
6  5   NA   NA
7  6    6    6

Here val2 is for which I perform the increase and val1 is just original value. For NA values I want to take previous row value and increase it by 1. I use dplyr package and lag function.
Here is the code:

library(dplyr)
tmp.data <- data.frame(id = c(0:6),val1 = c(0:6)) %>%
  mutate(
    val1 = ifelse( val1 %in% c(2,4:5),NA,val1),
    val2 = val1,
  ) 

tmp.data <- tmp.data %>%
  mutate(
    val2 = ifelse(
      is.na(val2),
      lag(val2) + 1,
      val2
      )
  )

Solution

  • Solution 1a: Assuming increase from previous missing value(s), with 2 consecutive missing rows

    Using dplyr:

    tmp.data %>%
      mutate(val2 = ifelse(is.na(val2), lag(val2) + 1, val2),
             val2 = ifelse(is.na(val2), lag(val2) + 1, val2))
    

    Solution 1b: Assuming increase from previous missing value(s), with N consecutive missing rows

    Using data.table, zoo and dplyr:

    setDT(tmp.data)[, consec := seq_len(.N), by=rleid(val2)]
    
    tmp.data %>%
      mutate(val2 = ifelse(is.na(val2), na.locf(val2) + consec, val2)) %>%
      select(-consec)
    

    Or written together:

    tmp.data %>%
      group_by(rleid(val2)) %>%
      mutate(consec = seq_along(val2)) %>%
      ungroup() %>%
      mutate(val2 = ifelse(is.na(val2), na.locf(val2) + consec, val2)) %>%
      select(id, val1, val2)
    

    Solution 2: Not assuming increase from previous missing value(s)

    Using dplyr and zoo:

    tmp.data %>%
      mutate(val2 = ifelse(is.na(val2), na.locf(val2) + 1, val2))