To my knowledge, there are three possible ways to code for second-order (and higher-) terms in a formula.
We can use the function I(..)
, the function poly(..)
and we can construct ourself the variable of the second degree. My question is: How do these functions work?
set.seed(23)
A = rnorm(12)
B = 1:12
C = factor(rep(c(1,2,3),4))
B2=B^2
what is the equivalent of lm(A~poly(B,2)*C)
when using I(..)
or when using the variable B2
?
The use of raw=T
in the poly(..)
function does not change anything to the results, correct?
lm(A~B2*C)
or
lm(A~I(B^2)*C)
give you the result of squaring column B and then doing the regression. Using
poly(B,2)
does something completely different - see ?poly.
Edit to add:
poly()
calculates orthogonal polynomials which are not the same as the standard polynomials derived from simply squaring, cubing etc. a number.