Search code examples
rlinear-regression

Splitting a point cloud into segments via linear regression


Say I have a point cloud with the following x and y coordinates:

#Sample Data
set.seed(666)
x = 1:60
y = c(1:20, rep(20,20),20:1)
y = y + runif(60,-1,1) #add some noise

My intention is to split the point cloud into three segments with the following trade-off criteria:

(1) the first segment should have the highest slope
(2) the second segment should have a slope close to 0 and 
(3) the third segment should have the lowest (negative) slope

as illustrated below:

plot(x,y)
abline(a = 0, b = 1, lwd = 2) #segment 1
abline(a = 20, b = 0, lwd = 2) #segment 2
abline(a = 60, b = -1, lwd = 2) #segment 3

enter image description here

The lines are just a wild guess, implying the slope of the three segments. Is there an efficient way/algorithm to fit three lines into the point cloud so that the respective criteria are fulfilled? Basically, I need to find a start point and end point for the second segment that maximizes the slope of the first slope, minimizes the slope of the third segment and keeps the slope of the second slope as close to 0 as possible.


Solution

  • It sounds like you want a piecewise linear regression. You can use the segmented package for this. Just give it a linear model and the number of breakpoints you want to estimate:

    library(segmented)
    
    pw_fit <- segmented(lm(y ~ x), seg.Z = ~x, npsi = 2)
    

    We can see we get decent estimates of the breakpoints at 20.22 and 41.20:

    summary(pw_fit)
    #> 
    #>  ***Regression Model with Segmented Relationship(s)***
    #> 
    #> Call: 
    #> segmented.lm(obj = lm(y ~ x), seg.Z = ~x, npsi = 2)
    #> 
    #> Estimated Break-Point(s):
    #>           Est. St.Err
    #> psi1.x 20.220  0.417
    #> psi2.x 41.206  0.391
    #> 
    #> Meaningful coefficients of the linear terms:
    #>             Estimate Std. Error t value Pr(>|t|)    
    #> (Intercept)  0.07668    0.30132   0.254      0.8    
    #> x            0.98256    0.02515  39.062   <2e-16 ***
    #> U1.x        -0.97310    0.03434 -28.338       NA    
    #> U2.x        -1.05405    0.03584 -29.409       NA    
    #> ---
    #> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    #> 
    #> Residual standard error: 0.6486 on 54 degrees of freedom
    #> Multiple R-Squared: 0.9913,  Adjusted R-squared: 0.9905 
    #> 
    #> Boot restarting based on 7 samples. Last fit:
    #> Convergence attained in 2 iterations (rel. change 5.9024e-12)
    

    We can even plot the model to see how well this fits our points:

    plot(pw_fit)
    points(x, y)
    

    Created on 2023-10-16 with reprex v2.0.2