I've been trying to figure out how to mimic a piecewise linear regression model developed in the pricing software Emblem, using R. I did that using @Roland's answer in the below post.
So to get the slopes, thanks to @Roland, I used the as.numeric((variable < X)) to get the slope of the second segment in the predictor variables.
What is going on here? Why does the "as.numeric" give me the correct answer? I can't find documentation on it and I would like to understand why this works.
It converts a boolean (TRUE
/ FALSE
) value to numeric (1
/ 0
).
(The R-y name for boolean is "logical": is.logical(TRUE)
returns TRUE
.)
x < 10 # TRUE if x is less than 10, FALSE if x is 10 or more
as.numeric(x<10) # 1 if x is less than 10, 0 if x is 10 or more
This being said, you don't really need an as.numeric
there. What you could do instead is:
# will also work:
mod2 <- lm(y~I((x<9.6)*x)+(x<9.6)+I((x>=9.6)*x)+(x>=9.6)-1)
This version will use the boolean values directly -- these are converted implicitly to factors, and how a factor functions within lm
is that it is converted into k-1
dichotomous variables where k
is the number of levels. So that's why, if you use the code above, you'll see variable names like x < 9.6TRUE
in the lm
output.
Then again, technically, as.numeric
is a hack, and a more transparent way to do it may be something like ifelse(x<9.6,1,0)
. But hacks are not necessarily bad, so you might also prefer a hackier hack such as (x<9.6)*1
but that won't work within a formula because *
has a special meaning in formulas, so you'd have to use I
around it: I((x<9.6)*1)
- I'd say as.numeric
looks cleaner.