I have the following data:
> str(sample)
'data.frame': 500 obs. of 3 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ predictor: num 0.446 0.395 0.484 0.919 0.844 ...
$ response : num 6.65 6.22 6.54 7.85 7.34 ...
I need to estimate the breakpoint used in stepwise regression, I am following the algorithm of Muggeo (2003). This is a plot of the predictor versus the response:
I have created the following function:
breakpoint<-function(psi,data){
repeat{
psi<-psi
predictor<-data[,1]
response<-data[,2]
U<-ifelse(predictor>=psi,predictor-psi,0)
V<-ifelse(predictor>=psi,-1,0)
model<-lm(response~predictor+U+V,data=data)
psi_new<-(model$coefficients[4]/model$coefficients[3])+psi
psi<-psi_new
if (abs(psi-psi_new)<0.00001 && model$coefficients[4]<0.00001) break
}
result<-c(psi,model$coefficients[4])
return(result)
}
However, the function is extremely sensitive to the initial value of psi, I am unsure why this would be. I think it might be to do with my convergence criteria. The function should converge when the "gap" between psi and psi_new is negligible and when the coefficient for V is basically zero.
The correct breakpoint determined by the package segmented is 0.27, when I input a psi of 0.25 I get the correct value, but when I input 0.4, the estimate changes to 0.32.
Maybe something like this:
## geneate some test data
set.seed(132)
predictor <- seq(0, 0.999, length.out = 500)
response <- approx(c(0, 0.27, 1), c(6.5, 6, 8.5), predictor)$y + rnorm(predictor, sd = 0.2)
sample <- data.frame(predictor, response)
breakpoint <- function(psi, data, tol=1e-5, maxit=50){
iterations <- 0
predictor <- data[,1]
repeat{
U <- ifelse(predictor >= psi, predictor - psi, 0)
V <- ifelse(predictor >= psi, -1, 0)
model <- lm(response ~ predictor + U + V, data = data)
p <- coef(model)
psi_new <- (p[4]/p[3]) + psi
psi <- psi_new
if ((abs(psi - psi_new) < tol) & abs(p[4]) < tol) break
if (iterations > maxit) {
warning("too many iterations")
break
}
iterations <- iterations + 1
}
result <- c(psi = unname(psi), p[4])
return(result)
}
psi <- 0.1
breakpoint(psi, sample, tol=1e-5) # warning, but psi is ok
breakpoint(psi, sample, tol=1e-2) # no warning
plot(predictor, response, cex=0.5, pch=16)
abline(v=breakpoint(psi, sample, tol=1e-2)["psi"], lty="dashed", col="red")
## for comparison
library("segmented")
segmented(lm(response ~ predictor, data=sample))
In this example, the breakpoint was well identified, but optimization can in theory be trapped in a local minimum. Therefore, it may be wise to try different start values.
Edit:
abs(p4)