If I have a data frame that I am going through using dplyr, how can I get the cummean of the last 2 rows before the last row?
For example...
------
|value|
-------
| 1 |
------
| 2 |
-------
| 3 |
------
| 4 |
-------
In this case, when the code goes through the value 4, I want the cummean to calculate (2+3)/2.
I am thinking something along the lines of this--
new_df -> df %>%
summarise(
roll_mean = cummean(value,-2)
)
However I know this does not work
[Using this data: df <- data.frame(value = 1:4)
]
For two prior values, you could do it manually like so:
df %>%
mutate(roll_mean = (lag(value) + lag(value,2))/2)
# value roll_mean
#1 1 NA
#2 2 NA
#3 3 1.5
#4 4 2.5
Or if the window might be larger, it could be easier to use slider:slide_*
, a nice set of window functions. Here we ask for the mean for a window that starts two elements prior (.before = 2
) and ends one element prior (after = -1
).
df %>%
mutate(roll_mean = slider::slide_dbl(value, mean, .before = 2, .after = -1))
# value roll_mean
#1 1 NaN
#2 2 1.0
#3 3 1.5
#4 4 2.5
By default slider
allows incomplete windows; if we want the same output we could use:
df %>%
mutate(roll_mean = slider::slide_dbl(value, mean, .before = 2, .after = -1, .complete = TRUE))
# value roll_mean
#1 1 NA
#2 2 NA
#3 3 1.5
#4 4 2.5