Hi, I would like to check whether the provided dplyr
workflow reflects the calculation of Mean Square Error estimator as descried by Taylor (2010).
I would like for the workflow to reflect the following equation:
where:
The utilised data is fairly straightforward and resembles the extract:
set.seed(123)
dta <- data.frame(group = rep(LETTERS[1:3], 10),
year = rep(2001:2010, 3),
value = round(runif(30),2))
The draft workflow would correspond to the code:
# Pkgs
Vectorize(require)(package = c("dplyr", "magrittr"),
char = TRUE)
# Workflow
dta %<>%
arrange(group, year) %>%
group_by(group) %>%
mutate(X1 = cumsum(value) / row_number()) %>%
mutate(X2 = cumsum(lead(value)) / (length(value) - row_number())) %>%
mutate(MSEe = cumsum((value - X1) ^ 2 + (value - X2) ^ 2))
Taylor, 2010, Change-point analysis: a powerful new tool for detecting changes Available: http://www.variation.com/cpa/tech/changepoint.html
This is what I have so far ... hope to learn a better way
dta %>%
arrange(group, year) %>%
group_by(group) %>%
mutate(cmX1=cummean(value), cmX2=(sum(value)-cumsum(value)) / (length(value) - row_number())) %>%
do(data.frame(m=1:nrow(.),
MSE=sapply(1:nrow(.), function(n) sum((.$value[1:n] - .$cmX1[n])^2) +
sum((.$value[(n+1):length(.$value)] - .$cmX2[n])^2)))) %>%
ungroup()
numerical check:
mse <- function(x, m) {
meanX1 <- sum(x[1:m]) / m
meanX2 <- sum(x[(m+1):length(x)]) / (length(x)-m)
sum((x[1:m] - meanX1)^2) + sum((x[(m+1):length(x)] - meanX2)^2)
} #mse
dta <- dta[order(dta$group, dta$year),]
sapply(1:10, function(n) mse(dta$value[dta$group=="A"], n))