Search code examples
rdplyrplyrnalocf

Imputing NA with conditional LOCF


I've updated a new different problem. This time I would like to obtain column Oxy2 from Oxy.

ID Oxy  Y   Oxy2
1  NA 2010   NA
1   0 2011    0
1  NA 2012   NA
1   1 2013    1
1  NA 2014    1
1  NA 2015    1
1  -1 2016    1
2   0 2011    0
2  NA 2012   NA
2   1 2013    1
2  -1 2014    1
3   0 2012    0
3  -1 2013   -1
3  NA 2014   NA
4  -1 2010   -1
4   1 2011    1
4  -1 2012    1
4  -1 2013    1
4   0 2014    1
4  NA 2015    1

Basically, I need to keep NAs, if there any, when previous values of my Oxy variable are 0 or -1, and replace everything coming after the first 1 appears with 1.

Again, thanks for your suggestions.


Solution

  • library(dplyr)
    library(zoo)
    df %>% 
       group_by(ID) %>% 
       mutate(Ins1=na.locf(ifelse(is.na(Ins) & lag(Ins)==0, 999, Ins), na.rm = FALSE), Ins2=na_if(Ins1, 999))
       #one step version
       #mutate(Ins1 = na_if(na.locf(ifelse(is.na(Ins) & lag(Ins)==0, 999, Ins), na.rm = FALSE), 999))
    
    # A tibble: 8 x 5
    # Groups:   ID [2]
         ID   Ins     Y  Ins1  Ins2
      <int> <int> <int> <dbl> <dbl>
    1     1     0  2010     0     0
    2     1    NA  2011   999    NA
    3     1     1  2012     1     1
    4     1    NA  2013     1     1
    5     1    NA  2014     1     1
    6     2     0  2011     0     0
    7     2     0  2012     0     0
    8     2    NA  2013   999    NA
    

    Update: To solve the -1 issue, I add a small change to what @user12492692 has suggested in the Edit, namely replaced the | with %in%

    df %>% 
      group_by(ID) %>% 
      mutate(Ins1 = na.locf(ifelse(is.na(Ins) & lag(Ins) %in% c(0,-1), 999, Ins), na.rm = FALSE), 
             Ins2 = na_if(Ins1, 999))