r time-series dplyr panel-data mean-square-error

Dplyr workflow reflecting Mean Square Error estimator in change point analysis

Background

Hi, I would like to check whether the provided dplyr workflow reflects the calculation of Mean Square Error estimator as descried by Taylor (2010).

Problem

I would like for the workflow to reflect the following equation:

where:

24 reflects the total number of observation in Taylor's data set. In the case of the provided data this would correspond to 10 observations per group.

Data

The utilised data is fairly straightforward and resembles the extract:

set.seed(123)
dta <- data.frame(group = rep(LETTERS[1:3], 10),
                  year = rep(2001:2010, 3),
                  value = round(runif(30),2))

Suggested workflow

The draft workflow would correspond to the code:

# Pkgs
Vectorize(require)(package = c("dplyr", "magrittr"),
                               char = TRUE)

# Workflow
dta %<>%
  arrange(group, year) %>% 
  group_by(group) %>% 
  mutate(X1 = cumsum(value) / row_number()) %>% 
  mutate(X2 = cumsum(lead(value)) / (length(value) - row_number())) %>% 
  mutate(MSEe = cumsum((value - X1) ^ 2  + (value - X2) ^ 2))

Reference

Taylor, 2010, Change-point analysis: a powerful new tool for detecting changes Available: http://www.variation.com/cpa/tech/changepoint.html

Solution

This is what I have so far ... hope to learn a better way

dta %>%
    arrange(group, year) %>% 
    group_by(group) %>% 
    mutate(cmX1=cummean(value), cmX2=(sum(value)-cumsum(value)) / (length(value) - row_number())) %>%
    do(data.frame(m=1:nrow(.), 
        MSE=sapply(1:nrow(.), function(n) sum((.$value[1:n] - .$cmX1[n])^2) + 
                sum((.$value[(n+1):length(.$value)] - .$cmX2[n])^2)))) %>% 
    ungroup()

numerical check:

mse <- function(x, m) { 
    meanX1 <- sum(x[1:m]) / m 
    meanX2 <- sum(x[(m+1):length(x)]) / (length(x)-m) 
    sum((x[1:m] - meanX1)^2) + sum((x[(m+1):length(x)] - meanX2)^2) 
} #mse 

dta <- dta[order(dta$group, dta$year),]
sapply(1:10, function(n) mse(dta$value[dta$group=="A"], n))