Search code examples
rregressionrepeatbreakpointsestimation

Estimate breakpoint until convergence


I have the following data:

> str(sample)
'data.frame':   500 obs. of  3 variables:
 $ X        : int  1 2 3 4 5 6 7 8 9 10 ...
 $ predictor: num  0.446 0.395 0.484 0.919 0.844 ...
 $ response : num  6.65 6.22 6.54 7.85 7.34 ...

I need to estimate the breakpoint used in stepwise regression, I am following the algorithm of Muggeo (2003). This is a plot of the predictor versus the response:

predvsresponse

I have created the following function:

breakpoint<-function(psi,data){
repeat{
  psi<-psi
  predictor<-data[,1]
  response<-data[,2]
  
  U<-ifelse(predictor>=psi,predictor-psi,0)
  V<-ifelse(predictor>=psi,-1,0)
  
  model<-lm(response~predictor+U+V,data=data)
  psi_new<-(model$coefficients[4]/model$coefficients[3])+psi
  psi<-psi_new
  if (abs(psi-psi_new)<0.00001 && model$coefficients[4]<0.00001) break
}
  result<-c(psi,model$coefficients[4])
  return(result)
}

However, the function is extremely sensitive to the initial value of psi, I am unsure why this would be. I think it might be to do with my convergence criteria. The function should converge when the "gap" between psi and psi_new is negligible and when the coefficient for V is basically zero.

The correct breakpoint determined by the package segmented is 0.27, when I input a psi of 0.25 I get the correct value, but when I input 0.4, the estimate changes to 0.32.


Solution

  • Maybe something like this:

    ## geneate some test data
    set.seed(132)
    predictor <- seq(0, 0.999, length.out = 500)
    response  <- approx(c(0, 0.27, 1), c(6.5, 6, 8.5), predictor)$y + rnorm(predictor, sd = 0.2)
    sample <- data.frame(predictor, response)
    
    breakpoint <- function(psi, data, tol=1e-5, maxit=50){
      iterations <- 0
      predictor  <- data[,1]
    
      repeat{
        U <- ifelse(predictor >= psi, predictor - psi, 0)
        V <- ifelse(predictor >= psi,              -1, 0)
    
        model <- lm(response ~ predictor + U + V, data = data)
        p <- coef(model)
        psi_new <- (p[4]/p[3]) + psi
        psi <- psi_new
        if ((abs(psi - psi_new) < tol) & abs(p[4]) < tol) break
        if (iterations > maxit) {
          warning("too many iterations")
          break
        }
        iterations <- iterations + 1
      }
      result <- c(psi = unname(psi), p[4])
      return(result)
    }
    
    psi <- 0.1
    breakpoint(psi, sample, tol=1e-5) # warning, but psi is ok
    breakpoint(psi, sample, tol=1e-2) # no warning
    
    plot(predictor, response, cex=0.5, pch=16)
    abline(v=breakpoint(psi, sample, tol=1e-2)["psi"], lty="dashed", col="red")
    
    ## for comparison
    library("segmented")
    segmented(lm(response ~ predictor, data=sample))
    

    In this example, the breakpoint was well identified, but optimization can in theory be trapped in a local minimum. Therefore, it may be wise to try different start values.

    Edit:

    • edited again, use abs(p4)
    • maximum iteration count to avoid endless loop

    breakpoint identification