Search code examples
rreplacelagzero

Replace 0's with previous non-zero value per ID (lag)


How can I replace all 0's with the last non-zero value per ID in R?

Example:

Input:

df <- data.frame(ID = c(1,1,1,1,1,1,1,2,2,2,2),
         Var1 = c(0,10, 30, 0, 0,50,80,0, 0, 57, 0)) 

Output:

df <- data.frame(ID = c(1,1,1,1,1,1,1,2,2,2,2),
         Var1 = c(0,10, 30, 0, 0,50,80,0, 0, 57, 0),
         res = c(0,10,30,30,30,50,80,0,0,57,57))

Is there an easy way with lag function?


Solution

  • Here's a tidyverse approach:

    library(tidyverse)
    df %>% 
      group_by(ID) %>% 
      mutate(x = replace(Var1, cumsum(Var1 !=0) > 0 & Var1 == 0, NA)) %>% 
      fill(x)
    # # A tibble: 11 x 4
    # # Groups:   ID [2]
    # ID  Var1   res     x
    # <dbl> <dbl> <dbl> <dbl>
    # 1    1.    0.    0.    0.
    # 2    1.   10.   10.   10.
    # 3    1.   30.   30.   30.
    # 4    1.    0.   30.   30.
    # 5    1.    0.   30.   30.
    # 6    1.   50.   50.   50.
    # 7    1.   80.   80.   80.
    # 8    2.    0.    0.    0.
    # 9    2.    0.    0.    0.
    # 10    2.   57.   57.   57.
    # 11    2.    0.   57.   57.
    

    In the mutate step, we replace 0's with NA except for those that are at the beginning of each ID-run because in those cases we have no values to replace the NAs afterwards.


    If you have multiple columns to adjust, you can use:

    df %>% 
      group_by(ID) %>% 
      mutate_at(vars(starts_with("Var")), 
                funs(replace(., cumsum(. !=0) > 0 & . == 0, NA))) %>% 
      fill(starts_with("Var"))
    

    where df could be:

    df <- data.frame(ID = c(1,1,1,1,1,1,1,2,2,2,2),
                     Var1 = c(0,10, 30, 0, 0,50,80,0, 0, 57, 0),
                     Var2 = c(4,0, 30, 0, 0,50,0,16, 0, 57, 0))