I want to add all possible interactions among the eight variables, which are all categorical. My dataset looks like following enter image description here
I use as.formula
to include all interactions. My code is below
f = as.formula(y ~ .^8)
x = model.matrix(f, data)[, -1]
y = data$y
However, my x
becomes following
enter image description here
And there are 6560 columns in total. I have no idea why it becomes this. Isn't it should still be 1, 2, 3
in x
variables? May I ask how I should fix this or interpret this?
Thank you!
You have eight variables each with three levels. You want to include every possible interaction, that is every possible combination of the eight factors.
There are 3^8 different possible combinations of values for your predictors. So there are 3^8=6561 possible main effects and interactions (including the intercept) in your design matrix.
To see how they are encoded consider a single 3-level predictor:
> model.matrix(lm(y ~ x1))
(Intercept) x12 x13
1 1 0 0
2 1 1 0
3 1 0 1
A single 3 level factor is encoded as 3 columns, an intercept plus two dummy variables.
Now add a second 3-level predictor and their interaction:
> model.matrix(lm(y ~ (x1+x2)^2))
(Intercept) x12 x13 x22 x23 x12:x22 x13:x22 x12:x23 x13:x23
1 1 0 0 0 1 0 0 0 0
2 1 1 0 1 0 1 0 0 0
3 1 0 1 0 0 0 0 0 0
So here there are 9 permissable combinations of those binary variables. When you get up to 8 variables, each of your 6561 possible combinations of predictors is represented by permissable combinations of these binary variables. (obviously you can't have both x12
and x13
positive at the same time).