I have a simple data.frame where I want to compute some summary statistics on a rolling basis. For example, a rolling median over a window of five observations (2 lags, current one and 2 ahead) is achieved by
library(dplyr)
x <- data.frame("vals" = rnorm(3e04))
y <- x %>%
mutate(med5 = rollapply(data = vals,
width = 5,
FUN = median,
align = "center",
fill = NA,
na.rm = TRUE))
However, I would like to exclude the current row from this computation. I found the following approach:
z <- x %>%
mutate(N=1:n()) %>%
do(data.frame(., prmed = sapply(.$N, function(i) median(.$vals[.$N %in% c((i - 2):(i - 1), (i + 1):(i + 2))]))))
This does what I want, if I subsequently set the first two values to NA
.
So far so good, the only problem is that the latter approach is terribly slow compared to rollapply.
Is there a way to achieve the outcome of the latter with the speed of the former?
A solution based on excluding the third number of the five, which is the current row of the calculation.
library(dplyr)
library(zoo)
set.seed(124)
x <- data.frame("vals" = rnorm(3e04))
y <- x %>%
mutate(med5 = rollapply(data = vals,
width = 5,
FUN = function(x) median(x[-3], na.rm = TRUE),
align = "center",
fill = NA))
head(y)
# vals med5
# 1 -1.38507062 NA
# 2 0.03832318 NA
# 3 -0.76303016 0.1253147
# 4 0.21230614 0.3914015
# 5 1.42553797 0.4562678
# 6 0.74447982 0.4562678