Search code examples
rsplinenon-linear-regressionpiecewise

How to parametrize piecewise regression coefficient to represent the slope for the following interval (instead of the change in the slope)


Consider the following dataset

Quantity <- c(25,39,45,57,70,85,89,100,110,124,137,150,177)
Sales <- c(1000,1250,2600,3000,3500,4500,5000,4700,4405,4000,3730,3400,3300)
df <- data.frame(Quantity,Sales)
df

Plotting the data, the distribution of observations is clearly non-linear, but presents a likely breaking-point around Quantity = 89 (I skip the plot here). Therefore, I built a joint piecewise linear model as follows

df$Xbar <- ifelse(df$Quantity>89,1,0)
df$diff <- df$Quantity - 89

reg <- lm(Sales ~ Quantity + I(Xbar * (Quantity - 89)), data = df)
summary(reg)

or simply

df$X <- df$diff*df$Xbar

reg <- lm(Sales ~ Quantity + X, data = df)
summary(reg)   

However, according to this parametrization, the coefficient of X represents the change in the slope from the preceding interval.

How can I parametrize the relevant coefficient to rather represent the slope for the second interval?

I did some research but I was unable to find the desired specification, apart from some automatization in stata (see the voice 'marginal' here https://www.stata.com/manuals13/rmkspline.pdf).

Any help is much appreciated. Thank you!

Acknowledgement: the workable example is retrieved from https://towardsdatascience.com/unraveling-spline-regression-in-r-937626bc3d96


Solution

  • The key here is to use a logical variable is.right which is TRUE for the points to the right of 89 and FALSE otherwise.

    From the the output shown 60.88 is the slope to the left of 89 and -19.97 is the slope to the right. The lines intersect at Quantity = 89, Sales = 4817.30.

    is.right <- df$Quantity > 89
    fm <- lm(Sales ~ diff : is.right, df)
    
    fm
    ## Call:
    ## lm(formula = Sales ~ diff:is.right, data = df)
    ##
    ## Coefficients:
    ##        (Intercept)  diff:is.rightFALSE   diff:is.rightTRUE  
    ##            4817.30               60.88              -19.97  
    

    Alternatives

    Alternately if you want to use Xbar from the question do it this way. It gives the same coefficients as fm.

    fm2 <- lm(Sales ~ diff : factor(Xbar), df)
    

    or

    fm3 <- lm(Sales ~ I(Xbar * diff) + I((1 - Xbar) * diff), df)
    

    Double check with nls

    We can double check these using nls with the following formulation which makes use of the fact that if we extend both lines the one to use at any Quantity is the lower of the two.

    st <- list(a = 0, b1 = 1, b2 = -1)
    fm4 <- nls(Sales ~ a + pmin(b1 * (Quantity - 89), b2 * (Quantity - 89)), start = st)
    fm4
    ## Nonlinear regression model
    ##   model: Sales ~ a + pmin(b1 * (Quantity - 89), b2 * (Quantity - 89))
    ##    data: parent.frame()
    ##       a      b1      b2 
    ## 4817.30   60.88  -19.97 
    ## residual sum-of-squares: 713120
    ##
    ## Number of iterations to convergence: 1 
    ## Achieved convergence tolerance: 2.285e-09
    

    This would also work:

    fm5 <- nls(Sales ~ a + ifelse(Quantity > 89, b2, b1) * diff, df, start = st)
    

    Plot

    Here is a plot:

    plot(Sales ~ Quantity, df)
    lines(fitted(fm) ~ Quantity, df)
    

    screenshot

    Model matrix

    And here is the model matrix for the linear regression:

    > model.matrix(fm)
       (Intercept) diff:is.rightFALSE diff:is.rightTRUE
    1            1                -64                 0
    2            1                -50                 0
    3            1                -44                 0
    4            1                -32                 0
    5            1                -19                 0
    6            1                 -4                 0
    7            1                  0                 0
    8            1                  0                11
    9            1                  0                21
    10           1                  0                35
    11           1                  0                48
    12           1                  0                61
    13           1                  0                88