I have grouped data and a variable I would like to smooth per group. If the absolute changes are small (e.g. less than 5) I consider them measurement error and thus want to copy (roll forward) the old value. Within each group I initialize the first measurement as default. Thereby I assume that the first observation per group is always correct (up to debate).
set.seed(5)
mydata = data.frame(group=c(1,1,1,1,1,1,1,2,2,2,2,2,2,2),
year=seq(from=2003, to=2009, by=1),
variable = round(runif(14, min = -5, max = 15),0))
mydata %>%
filter(variable > 0) %>%
group_by(group) %>%
mutate(smooth5 = ifelse( abs( lag(variable, n = 1, default = first(variable)) - variable ) <= 5 , variable, 5)) %>%
select(group, year, variable, smooth5) %>%
arrange(group)
# A tibble: 10 x 4
# Groups: group [2]
group year variable smooth5
<dbl> <dbl> <dbl> <dbl>
1 1 2004 9 9
2 1 2005 13 13 # <- this change is |4|, thus it should use the old value 9
3 1 2006 1 5 # <- here 13 changes to 1 is a reasonable change, should keep 1
4 1 2008 9 5
5 1 2009 6 6
6 2 2003 11 11
7 2 2004 14 14
8 2 2007 5 5
9 2 2008 1 1
10 2 2009 6 6
You are close, but there is some mistake in your ifelse()
call. Below, I added a new variable previous
for clarity. If abs(previous - variable) <= 5
, you want previous
, otherwise you want variable
:
mydata %>%
filter(variable > 0) %>%
group_by(group) %>%
mutate(previous = lag(variable, n = 1, default = first(variable)),
smooth5 = ifelse(abs(previous - variable) <= 5, previous, variable)) %>%
select(group, year, variable, smooth5) %>%
arrange(group)
which gives
# A tibble: 10 x 4
# Groups: group [2]
group year variable smooth5
<dbl> <dbl> <dbl> <dbl>
1 1 2004 9 9
2 1 2005 13 9
3 1 2006 1 1
4 1 2008 9 9
5 1 2009 6 9
6 2 2003 11 11
7 2 2004 14 11
8 2 2007 5 5
9 2 2008 1 5
10 2 2009 6 1