Search code examples
rregressionlinear-regressionnlspiecewise

Syntax for three-piece segmented regression using NLS in R when concave


My goal is to fit a three-piece (i.e., two break-point) regression model to make predictions using propagate's predictNLS function, making sure to define knots as parameters, but my model formula seems off.

I've used the segmented package to estimate the breakpoint locations (used as starting values in NLS), but would like to keep my models in the NLS format, specifically, nlsLM {minipack.lm} because I am fitting other types of curves to my data using NLS, want to allow NLS to optimize the knot values, am sometimes using variable weights, and need to be able to easily calculate the Monte Carlo confidence intervals from propagate. Though I'm very close to having the right syntax for the formula, I'm not getting the expected/required behaviour near the breakpoint(s). The segments SHOULD meet directly at the breakpoints (without any jumps), but at least on this data, I'm getting a weird local minimum at the breakpoint (see plots below).

Below is an example of my data and general process. I believe my issue to be in the NLS formula.

library(minpack.lm)
library(segmented)

y <- c(-3.99448113, -3.82447011, -3.65447803, -3.48447030, -3.31447855, -3.14448753, -2.97447972, -2.80448401, -2.63448380, -2.46448069, -2.29448796, -2.12448912, -1.95448783, -1.78448797, -1.61448563, -1.44448719, -1.27448469, -1.10448651, -0.93448525, -0.76448637, -0.59448626, -0.42448586, -0.25448588, -0.08448548,  0.08551417,  0.25551393,  0.42551411,  0.59551395,  0.76551389,  0.93551398)

x <- c(61586.1711, 60330.5550, 54219.9925, 50927.5381, 48402.8700, 45661.9175, 37375.6023, 33249.1248, 30808.6131, 28378.6508, 22533.3782, 13901.0882, 11716.5669, 11004.7305, 10340.3429,  9587.7994,  8736.3200,  8372.1482,  8074.3709,  7788.1847,  7499.6721,  7204.3168,  6870.8192,  6413.0828,  5523.8097,  3961.6114,  3460.0913,  2907.8614, 2016.1158,   452.8841)


df<- data.frame(x,y)


#Use Segmented to get estimates for parameters with 2 breakpoints
my.seg2 <- segmented(lm(y ~ x, data = df), seg.Z = ~ x, npsi = 2)


#extract knot, intercept, and coefficient values to use as NLS start points
my.knot1 <- my.seg2$psi[1,2]
my.knot2 <- my.seg2$psi[2,2]
my.m_2 <- slope(my.seg2)$x[1,1]
my.b1 <- my.seg2$coefficients[[1]]
my.b2 <- my.seg2$coefficients[[2]]
my.b3 <- my.seg2$coefficients[[3]]

#Fit a NLS model to ~replicate segmented model. Presumably my model formula is where the problem lies
my.model <- nlsLM(y~m*x+b+(b2*(ifelse(x>=knot1&x<=knot2,1,0)*(x-knot1))+(b3*ifelse(x>knot2,1,0)*(x-knot2-knot1))),data=df, start = c(m = my.m_2, b = my.b1, b2 = my.b2, b3 = my.b3, knot1 = my.knot1, knot2 = my.knot2))

How it should look

plot(my.seg2)

enter image description here

How it does look

plot(x, y)
lines(x=x, y=predict(my.model), col='black', lty = 1, lwd = 1)

enter image description here

I was pretty sure I had it "right", but when the 95% confidence intervals are plotted with the line and prediction resolution (e.g., the density of x points) is increased, things seem dramatically incorrect.

enter image description here

Thank you all for your help.


Solution

  • Define g to be a grouping vector having the same length as x which takes on values 1, 2, 3 for the 3 sections of the X axis and create an nls model from these. The resulting plot looks ok.

    my.knots <- c(my.knot1, my.knot2)
    g <- cut(x, c(-Inf, my.knots, Inf), label = FALSE)
    fm <- nls(y ~ a[g] + b[g] * x, df, start = list(a = c(1, 1, 1), b = c(1, 1, 1)))
    
    plot(y ~ x, df)
    lines(fitted(fm) ~ x, df, col = "red")
    

    (continued after graph) screenshot

    Constraints

    Although the above looks ok and may be sufficient it does not guarantee that the segments intersect at the knots. To do that we must impose the constraints that both sides are equal at the knots:

    a[2] + b[2] * my.knots[1] = a[1] + b[1] * my.knots[1]
    a[3] + b[3] * my.knots[2] = a[2] + b[2] * my.knots[2]
    

    so

    a[2] = a[1] + (b[1] - b[2]) * my.knots[1]
    a[3] = a[2] + (b[2] - b[3]) * my.knots[2]
         = a[1] + (b[1] - b[2]) * my.knots[1] + (b[2] - b[3]) * my.knots[2]
    

    giving:

    # returns a vector of the three a values
    avals <- function(a1, b) unname(cumsum(c(a1, -diff(b) * my.knots)))
    
    fm2 <- nls(y ~ avals(a1, b)[g] + b[g] * x, df, start = list(a1 = 1, b = c(1, 1, 1)))
    

    To get the three a values we can use:

    co <- coef(fm2)
    avals(co[1], co[-1])
    

    To get the residual sum of squares:

    deviance(fm2)
    ## [1] 0.193077
    

    Polynomial

    Although it involves a large number of parameters, a polynomial fit could be used in place of the segmented linear regression. A 12th degree polynomial involves 13 parameters but has a lower residual sum of squares than the segmented linear regression. A lower degree could be used with corresponding increase in residual sum of squares. A 7th degree polynomial involves 8 parameters and visually looks not too bad although it has a higher residual sum of squares.

    fm12 <- nls(y ~ cbind(1, poly(x, 12)) %*% b, df, start = list(b = rep(1, 13)))
    
    deviance(fm12)
    ## [1] 0.1899218