Search code examples
rdplyrlag

R lag across arbitrary number of missing values


library(tidyverse)
testdata <- tibble(ID=c(1,NA,NA,2,NA,3),
                   Observation = LETTERS[1:6])

testdata1 <- testdata %>% 
  mutate(
    ID1 = case_when(
      is.na(ID) ~ lag(ID, 1),
      TRUE ~ ID
    )
  )

testdata1

I have a dataset like testdata, with a valid ID only when ID changes. There can be an arbitrary number of records in a set, but the above case_when and lag() structure does not fill in ID for all records, just for record 2 in each group. Is there a way to get the 3rd (or deeper) IDs filled with the appropriate value?


Solution

  • We can use fill from the tidyr package. Since you are using tidyverse, tidyr is already inlcuded.

    testdata1 <- testdata %>% 
      fill(ID)
    
    testdata1
    # # A tibble: 6 x 2
    #      ID Observation
    #   <dbl> <chr>      
    # 1     1 A          
    # 2     1 B          
    # 3     1 C          
    # 4     2 D          
    # 5     2 E          
    # 6     3 F
    

    Or we can use na.locf from the zoo package.

    library(zoo)
    
    testdata1 <- testdata %>% 
      mutate(ID = na.locf(ID))
    
    testdata1
    # # A tibble: 6 x 2
    #      ID Observation
    #   <dbl> <chr>      
    # 1     1 A          
    # 2     1 B          
    # 3     1 C          
    # 4     2 D          
    # 5     2 E          
    # 6     3 F