Search code examples
rpiecewise

How is as.numeric used here?


I've been trying to figure out how to mimic a piecewise linear regression model developed in the pricing software Emblem, using R. I did that using @Roland's answer in the below post.

https://stats.stackexchange.com/questions/61805/standard-error-of-slopes-in-piecewise-linear-regression-with-known-breakpoints

So to get the slopes, thanks to @Roland, I used the as.numeric((variable < X)) to get the slope of the second segment in the predictor variables.

What is going on here? Why does the "as.numeric" give me the correct answer? I can't find documentation on it and I would like to understand why this works.


Solution

  • It converts a boolean (TRUE / FALSE) value to numeric (1 / 0).

    (The R-y name for boolean is "logical": is.logical(TRUE) returns TRUE.)

    x < 10 # TRUE if x is less than 10, FALSE if x is 10 or more

    as.numeric(x<10) # 1 if x is less than 10, 0 if x is 10 or more

    This being said, you don't really need an as.numeric there. What you could do instead is:

    # will also work:
    mod2 <- lm(y~I((x<9.6)*x)+(x<9.6)+I((x>=9.6)*x)+(x>=9.6)-1)
    

    This version will use the boolean values directly -- these are converted implicitly to factors, and how a factor functions within lm is that it is converted into k-1 dichotomous variables where k is the number of levels. So that's why, if you use the code above, you'll see variable names like x < 9.6TRUE in the lm output.

    Then again, technically, as.numeric is a hack, and a more transparent way to do it may be something like ifelse(x<9.6,1,0). But hacks are not necessarily bad, so you might also prefer a hackier hack such as (x<9.6)*1 but that won't work within a formula because * has a special meaning in formulas, so you'd have to use I around it: I((x<9.6)*1) - I'd say as.numeric looks cleaner.