Search code examples
rbreakpointsnon-linear-regressionpiecewise

Selecting the number of breakpoints in segmented regression


I'm trying to estimate multiple breakpoints in X for a response variable Y. When I run the segmented package in R, I get 1 estimated breakpoint at x=14 if I specify 1 point in psi statement and two estimated points at x=6.5 and x=11.4 if I specify 2 points in psi. How can I determine if 2 breakpoints are optimal or 1 breakpoint is optimal? Please see the code and output below:

Specifying 1 breakpoint:

segmented.glm(obj = fit.glm, seg.Z = ~x, psi = 10)

Estimated Break-Point(s):
                            Est. St.Err
psi1.x   14  2.691

Null     deviance: 230311  on 1509  degrees of freedom
Residual deviance: 175795  on 1480  degrees of freedom
AIC: 11531
Convergence attained in 0 iter. (rel. change 1.5525e-08)

> slope(fit.seg)
$x
            Est.  St.Err.   t value CI(95%).l CI(95%).u
slope1 -0.847880 0.097683 -8.679900   -1.0393  -0.65643
slope2  0.036962 0.574770  0.064308   -1.0896   1.16350

Specifying 2 breakpoints:

fit.seg<-segmented(fit.glm, seg.Z=~x, psi= c(6, 11))
 
Estimated Break-Point(s):
        Est. St.Err
psi1.x  6.562  1.771
psi2.x 11.398  1.660

Null     deviance: 230311  on 1509  degrees of freedom
Residual deviance: 175594  on 1478  degrees of freedom
AIC: 11533
Convergence attained in 1 iter. (rel. change 0)

> slope(fit.seg)
$x
           Est. St.Err.  t value CI(95%).l CI(95%).u
slope1 -0.56943 0.23681 -2.40460  -1.03360  -0.10530
slope2 -1.25180 0.38974 -3.21190  -2.01570  -0.48794
slope3 -0.17365 0.31700 -0.54781  -0.79495   0.44765

I used seg.control but do not know how to interpret the output. (based on Muggeo, V.M.R. (2008) Segmented: an R package to fit regression models with broken-line relationships. R News 8/1, 20–25.)

> o <- segmented(fit.glm, seg.Z=~x, psi=NA, control=seg.control(display=FALSE, K=2))
Warning message:
max number of iterations (1) attained 
> slope(o)  # defaults to confidence level of 0.95 (conf.level=0.95)
$x
           Est. St.Err.  t value CI(95%).l CI(95%).u
slope1 -0.56943 0.23681 -2.40460  -1.03360  -0.10530
slope2 -1.25180 0.38974 -3.21190  -2.01570  -0.48794
slope3 -0.17365 0.31700 -0.54781  -0.79495   0.44765

> o <- segmented(fit.glm, seg.Z=~x, psi=NA, control=seg.control(display=FALSE, K=1))
Warning messages:
1: max number of iterations (1) attained 
2: max number of iterations (1) attained 
> slope(o)  # defaults to confidence level of 0.95 (conf.level=0.95)
$x
            Est.  St.Err.   t value CI(95%).l CI(95%).u
slope1 -0.847880 0.097683 -8.679900   -1.0393  -0.65643
slope2  0.036966 0.574770  0.064314   -1.0896   1.16350

Can anyone help me figure out how can I decide whether 2 breakpoints are better estimates or 1 breakpoint?


Solution

  • the function selgmented() (also in the R package segmented) is a wrapper to select the "best" number of breakpoints via hypothesis testing (e.g. Score test) or the BIC. Currently selection via hypothesis testing is confined to 0,1, or 2 breakpoints to be selected. kind regards, vito