Search code examples
rlinear-regression

'Fixed knots and prediction' in piecewise linear regression using the segmented package


Hi this question relates to a question asked previously but as that was succesfully answered and my queries are now different I feel a new question is valid. Also please feel free to move this to Cross Validated if you think this is the wrong place.

The question to which this relates is here. I will provide the same data here for reference:

x<-c(1e-08, 1.1e-08, 1.2e-08, 1.3e-08, 1.4e-08, 1.6e-08, 1.7e-08, 
1.9e-08, 2.1e-08, 2.3e-08, 2.6e-08, 2.8e-08, 3.1e-08, 3.5e-08, 
4.2e-08, 4.7e-08, 5.2e-08, 5.8e-08, 6.4e-08, 7.1e-08, 7.9e-08, 
8.8e-08, 9.8e-08, 1.1e-07, 1.23e-07, 1.38e-07, 1.55e-07, 1.76e-07, 
1.98e-07, 2.26e-07, 2.58e-07, 2.95e-07, 3.25e-07, 3.75e-07, 4.25e-07, 
4.75e-07, 5.4e-07, 6.15e-07, 6.75e-07, 7.5e-07, 9e-07, 1.15e-06, 
1.45e-06, 1.8e-06, 2.25e-06, 2.75e-06, 3.25e-06, 3.75e-06, 4.5e-06, 
5.75e-06, 7e-06, 8e-06, 9.25e-06, 1.125e-05, 1.375e-05, 1.625e-05, 
1.875e-05, 2.25e-05, 2.75e-05, 3.1e-05)

y2<-c(-0.169718017273307, 7.28508517630734, 71.6802510299446, 164.637259265704, 
322.02901173786, 522.719633360006, 631.977073772459, 792.321270345847, 
971.810607095548, 1132.27551798986, 1321.01923840546, 1445.33152600664, 
1568.14204073109, 1724.30089942149, 1866.79717333592, 1960.12465709003, 
2028.46548012508, 2103.16027631327, 2184.10965255236, 2297.53360080873, 
2406.98288043262, 2502.95194879366, 2565.31085776325, 2542.7485752473, 
2499.42610084412, 2257.31567571328, 2150.92120390084, 1998.13356362596, 
1990.25434682546, 2101.21333152526, 2211.08405955931, 1335.27559108724, 
381.326449703455, 430.9020598199, 291.370887491989, 219.580548355043, 
238.708972427248, 175.583544448326, 106.057481792519, 59.8876372379487, 
26.965143266819, 10.2965349811467, 5.07812046132922, 3.19125838983254, 
0.788251933518549, 1.67980552001939, 1.97695007279929, 0.770663673279958, 
0.209216903989619, 0.0117903221723813, 0.000974437796492681, 
0.000668823762763647, 0.000545308757270207, 0.000490042305650751, 
0.000468780182460397, 0.000322977916070751, 0.000195423690538495, 
0.000175847622407421, 0.000135771259866332, 9.15607623591363e-05)

I have successfully fitted a piecewise linear model using segmented with breakpoints at 1e-07 and 1e-06:

linear.model2<-lm(y~x)
segmented.mod2<-segmented(linear.model2,seg.Z= ~x, psi=c(0.0000001,0.000001))

The resulting plot looks like this and is a reasonably good fit: Segmented fit

However it is clear that the breakpoints are not where I set them (which summary(segmented.mod2) confirms). Now I realise that this is the point of the package, it iterates until it finds the breakpoints which define the best fit however I have a theoretical reason to have the best fit defined by those exact breakpoints (I guess you would call this fixed knots?). Is this possible with the segmented package or can anybody offer a better solution with breakpoints exactly defined?

A further question is that since I will use these relationships for prediction I want to extract the slopes and intercepts that define the line segments. This should be possible using slope(segmented.mod2) and intercept(segmented.mod2) however the results in this case do not seem to correspond to what I was expecting:

Est.   St.Err.  t value  CI(95%).l  CI(95%).u
slope1  4.614e+10 3.936e+09  11.7200  3.824e+10  5.403e+10
slope2 -6.177e+09 4.397e+08 -14.0500 -7.059e+09 -5.296e+09
slope3 -2.534e+06 5.376e+06  -0.4714 -1.332e+07  8.248e+06

Est.
intercept1 -165.90
intercept2 3061.00
intercept3   46.93

Are the lines not of the form y=mx+c? An extraordinarily stupid question perhaps but they are my favourite! ;)

Any advice will be gratefully received.


Solution

  • If you stop the calculations after the first iteration your first breakpoints will be the starting values:

    sctrl <- seg.control(tol = 1e-04, it.max = 1, display = FALSE,
         stop.if.error = TRUE, K = 10, quant = FALSE, last = TRUE, maxit.glm = 25,
         h = 1, 
         n.boot=20, size.boot=NULL, gap=FALSE, jt=FALSE, nonParam=TRUE,
         random=TRUE, powers=c(1,1), seed=NULL)
     linear.model2<-lm(y2~x)
     segmented.mod3<-segmented(linear.model2,seg.Z= ~x, control=sctrl, 
                                    psi=c(0.0000001,0.000001))
    
     plot(log(x) ,predict(segmented.mod3))
    
     abline(v=log(0.0000001))      # cannot see anything of value on original scale
     abline(v=log(0.000001))