Search code examples
rdata-analysis

How do I remove the automatic breakpoints/coefficients from the segmented package?


I have been running analysis on data that involves breakpoints and I'm using the segmented package. When I run my analysis, I get the following output:

fit <- lm(XBRrate~Gap, data = batstraining2)
summary(fit)
segmented.fit <- segmented(fit, seg.Z = ~Gap, fixed.psi = 100)
summary(segmented.fit)
> summary(segmented.fit)

    ***Regression Model with Segmented Relationship(s)***

Call: 
segmented.lm(obj = fit, seg.Z = ~Gap, fixed.psi = 100)

Estimated Break-Point(s):
          Est. St.Err
psi1.Gap   74  3.761

Meaningful coefficients of the linear terms:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)   0.0157829  0.0118917   1.327   0.1847    
Gap           0.0036064  0.0001935  18.635   <2e-16 ***
U1.Gap       -0.0012212  0.0002588  -4.719       NA    
U1.fixed.Gap -0.0017419  0.0007509  -2.320   0.0205 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.03699 on 1046 degrees of freedom
Multiple R-Squared: 0.6456,  Adjusted R-squared: 0.6443 

Boot restarting based on 10 samples. Last fit:
Convergence attained in 1 iterations (rel. change 2.627e-07)

The problem is it includes the U1.Gap in its final model, which is easiest to see in the plot.

I need to not have this auto breakpoint included. In the community, we understand there are breakpoints at fixed intervals, which is why I want to use those fixed breakpoints. but since it includes that auto breakpoint at 74 everything is wrong after 74.

I have tried using npsi as well, and while the breakpoints won't have the exact same issue, sometimes it will put breakpoints at very odd places that make no sense to actually have breakpoints in context.

I also tried using the strucchange package but it didn't seem to give results that made sense. The examples I found online were formatted for date data, and this is not date related. I'm not 100% sure if this is the issue, but if that's a better option I would need someone to show me a different way to use it, most likely.


Solution

  • If your goal is to fit a segmented regression without estimating breakpoints (i.e., with breakpoints that are known a priori), you could just estimated a piecewise linear regression using lm(). Specifically, you would need to include the variable of interest and a piecewise linear basis function. Here's what the basis function would look like:

    pwl <- function(x, k)ifelse(x >= k, x-k, 0)
    
    

    where x is the variable and k is the knot location. You could use it in the regression model as follows:

    data(mtcars)
    library(car)
    library(ggeffects)
    
    mod <- lm(mpg ~ hp + pwl(hp, 180), data=mtcars)
    summary(mod)
    #> 
    #> Call:
    #> lm(formula = mpg ~ hp + pwl(hp, 180), data = mtcars)
    #> 
    #> Residuals:
    #>     Min      1Q  Median      3Q     Max 
    #> -5.0587 -2.1073 -0.6383  1.5470  8.1293 
    #> 
    #> Coefficients:
    #>              Estimate Std. Error t value Pr(>|t|)    
    #> (Intercept)  34.81289    1.94632  17.887  < 2e-16 ***
    #> hp           -0.11099    0.01504  -7.382 3.92e-08 ***
    #> pwl(hp, 180)  0.10415    0.02996   3.476  0.00162 ** 
    #> ---
    #> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    #> 
    #> Residual standard error: 3.301 on 29 degrees of freedom
    #> Multiple R-squared:  0.7194, Adjusted R-squared:    0.7 
    #> F-statistic: 37.17 on 2 and 29 DF,  p-value: 9.947e-09
    
    

    Note that the coefficient on hp gives the slope of the relationship before the knot and the coefficient on hp plus the coefficient on pwl(hp, 180) gives the slope after the knot. You could use the linearHypothesis() function from the car package to test whether the slope after the knot is different from zero.

    linearHypothesis(mod, "hp + pwl(hp, 180) = 0")
    #> Linear hypothesis test
    #> 
    #> Hypothesis:
    #> hp  + pwl(hp, 180) = 0
    #> 
    #> Model 1: restricted model
    #> Model 2: mpg ~ hp + pwl(hp, 180)
    #> 
    #>   Res.Df    RSS Df Sum of Sq      F Pr(>F)
    #> 1     30 317.31                           
    #> 2     29 315.99  1    1.3211 0.1212 0.7302
    

    You could even use the ggpredict() function from the ggeffects package to plot what the effect looks like:

    p <- ggpredict(mod, term="hp [all]")
    plot(p)
    

    Created on 2023-02-23 with reprex v2.0.2