Search code examples
rtime-seriesdplyrpanel-datamean-square-error

Dplyr workflow reflecting Mean Square Error estimator in change point analysis


Background

Hi, I would like to check whether the provided dplyr workflow reflects the calculation of Mean Square Error estimator as descried by Taylor (2010).

Problem

I would like for the workflow to reflect the following equation: equation

where:

  • 24 reflects the total number of observation in Taylor's data set. In the case of the provided data this would correspond to 10 observations per group.

Data

The utilised data is fairly straightforward and resembles the extract:

set.seed(123)
dta <- data.frame(group = rep(LETTERS[1:3], 10),
                  year = rep(2001:2010, 3),
                  value = round(runif(30),2))

Suggested workflow

The draft workflow would correspond to the code:

# Pkgs
Vectorize(require)(package = c("dplyr", "magrittr"),
                               char = TRUE)

# Workflow
dta %<>%
  arrange(group, year) %>% 
  group_by(group) %>% 
  mutate(X1 = cumsum(value) / row_number()) %>% 
  mutate(X2 = cumsum(lead(value)) / (length(value) - row_number())) %>% 
  mutate(MSEe = cumsum((value - X1) ^ 2  + (value - X2) ^ 2))

Reference

Taylor, 2010, Change-point analysis: a powerful new tool for detecting changes Available: http://www.variation.com/cpa/tech/changepoint.html


Solution

  • This is what I have so far ... hope to learn a better way

    dta %>%
        arrange(group, year) %>% 
        group_by(group) %>% 
        mutate(cmX1=cummean(value), cmX2=(sum(value)-cumsum(value)) / (length(value) - row_number())) %>%
        do(data.frame(m=1:nrow(.), 
            MSE=sapply(1:nrow(.), function(n) sum((.$value[1:n] - .$cmX1[n])^2) + 
                    sum((.$value[(n+1):length(.$value)] - .$cmX2[n])^2)))) %>% 
        ungroup()
    

    numerical check:

    mse <- function(x, m) { 
        meanX1 <- sum(x[1:m]) / m 
        meanX2 <- sum(x[(m+1):length(x)]) / (length(x)-m) 
        sum((x[1:m] - meanX1)^2) + sum((x[(m+1):length(x)] - meanX2)^2) 
    } #mse 
    
    dta <- dta[order(dta$group, dta$year),]
    sapply(1:10, function(n) mse(dta$value[dta$group=="A"], n))