Search code examples
rdplyrsmoothing

How to use dplyr lag() to smooth minor changes in a variable


I have grouped data and a variable I would like to smooth per group. If the absolute changes are small (e.g. less than 5) I consider them measurement error and thus want to copy (roll forward) the old value. Within each group I initialize the first measurement as default. Thereby I assume that the first observation per group is always correct (up to debate).

set.seed(5)
mydata = data.frame(group=c(1,1,1,1,1,1,1,2,2,2,2,2,2,2), 
                       year=seq(from=2003, to=2009, by=1), 
                       variable = round(runif(14, min = -5, max = 15),0))
mydata %>%
  filter(variable > 0) %>%
  group_by(group) %>%
  mutate(smooth5 = ifelse( abs( lag(variable, n = 1, default = first(variable)) - variable ) <= 5 , variable, 5)) %>%       
  select(group, year, variable, smooth5) %>%
  arrange(group)

# A tibble: 10 x 4
# Groups:   group [2]
   group  year variable smooth5
   <dbl> <dbl>    <dbl>   <dbl>
 1     1  2004        9       9
 2     1  2005       13      13  # <- this change is |4|, thus it should use the old value 9
 3     1  2006        1       5  # <- here 13 changes to 1 is a reasonable change, should keep 1
 4     1  2008        9       5
 5     1  2009        6       6
 6     2  2003       11      11
 7     2  2004       14      14
 8     2  2007        5       5
 9     2  2008        1       1
10     2  2009        6       6

Solution

  • You are close, but there is some mistake in your ifelse() call. Below, I added a new variable previous for clarity. If abs(previous - variable) <= 5, you want previous, otherwise you want variable:

    mydata %>%
      filter(variable > 0) %>%
      group_by(group) %>%
      mutate(previous = lag(variable, n = 1, default = first(variable)),
             smooth5 = ifelse(abs(previous - variable) <= 5, previous, variable)) %>%       
      select(group, year, variable, smooth5) %>%
      arrange(group)
    

    which gives

    # A tibble: 10 x 4
    # Groups:   group [2]
       group  year variable smooth5
       <dbl> <dbl>    <dbl>   <dbl>
     1     1  2004        9       9
     2     1  2005       13       9
     3     1  2006        1       1
     4     1  2008        9       9
     5     1  2009        6       9
     6     2  2003       11      11
     7     2  2004       14      11
     8     2  2007        5       5
     9     2  2008        1       5
    10     2  2009        6       1