I have been running analysis on data that involves breakpoints and I'm using the segmented
package. When I run my analysis, I get the following output:
fit <- lm(XBRrate~Gap, data = batstraining2)
summary(fit)
segmented.fit <- segmented(fit, seg.Z = ~Gap, fixed.psi = 100)
summary(segmented.fit)
> summary(segmented.fit)
***Regression Model with Segmented Relationship(s)***
Call:
segmented.lm(obj = fit, seg.Z = ~Gap, fixed.psi = 100)
Estimated Break-Point(s):
Est. St.Err
psi1.Gap 74 3.761
Meaningful coefficients of the linear terms:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0157829 0.0118917 1.327 0.1847
Gap 0.0036064 0.0001935 18.635 <2e-16 ***
U1.Gap -0.0012212 0.0002588 -4.719 NA
U1.fixed.Gap -0.0017419 0.0007509 -2.320 0.0205 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.03699 on 1046 degrees of freedom
Multiple R-Squared: 0.6456, Adjusted R-squared: 0.6443
Boot restarting based on 10 samples. Last fit:
Convergence attained in 1 iterations (rel. change 2.627e-07)
The problem is it includes the U1.Gap in its final model, which is easiest to see in the plot.
I need to not have this auto breakpoint included. In the community, we understand there are breakpoints at fixed intervals, which is why I want to use those fixed breakpoints. but since it includes that auto breakpoint at 74 everything is wrong after 74.
I have tried using npsi as well, and while the breakpoints won't have the exact same issue, sometimes it will put breakpoints at very odd places that make no sense to actually have breakpoints in context.
I also tried using the strucchange
package but it didn't seem to give results that made sense. The examples I found online were formatted for date data, and this is not date related. I'm not 100% sure if this is the issue, but if that's a better option I would need someone to show me a different way to use it, most likely.
If your goal is to fit a segmented regression without estimating breakpoints (i.e., with breakpoints that are known a priori), you could just estimated a piecewise linear regression using lm()
. Specifically, you would need to include the variable of interest and a piecewise linear basis function. Here's what the basis function would look like:
pwl <- function(x, k)ifelse(x >= k, x-k, 0)
where x
is the variable and k
is the knot location. You could use it in the regression model as follows:
data(mtcars)
library(car)
library(ggeffects)
mod <- lm(mpg ~ hp + pwl(hp, 180), data=mtcars)
summary(mod)
#>
#> Call:
#> lm(formula = mpg ~ hp + pwl(hp, 180), data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -5.0587 -2.1073 -0.6383 1.5470 8.1293
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 34.81289 1.94632 17.887 < 2e-16 ***
#> hp -0.11099 0.01504 -7.382 3.92e-08 ***
#> pwl(hp, 180) 0.10415 0.02996 3.476 0.00162 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 3.301 on 29 degrees of freedom
#> Multiple R-squared: 0.7194, Adjusted R-squared: 0.7
#> F-statistic: 37.17 on 2 and 29 DF, p-value: 9.947e-09
Note that the coefficient on hp
gives the slope of the relationship before the knot and the coefficient on hp
plus the coefficient on pwl(hp, 180)
gives the slope after the knot. You could use the linearHypothesis()
function from the car
package to test whether the slope after the knot is different from zero.
linearHypothesis(mod, "hp + pwl(hp, 180) = 0")
#> Linear hypothesis test
#>
#> Hypothesis:
#> hp + pwl(hp, 180) = 0
#>
#> Model 1: restricted model
#> Model 2: mpg ~ hp + pwl(hp, 180)
#>
#> Res.Df RSS Df Sum of Sq F Pr(>F)
#> 1 30 317.31
#> 2 29 315.99 1 1.3211 0.1212 0.7302
You could even use the ggpredict()
function from the ggeffects
package to plot what the effect looks like:
p <- ggpredict(mod, term="hp [all]")
plot(p)
Created on 2023-02-23 with reprex v2.0.2