Search code examples
rmatrixmodelindicator

All Levels of a Factor in a Model Matrix in R


I have a data.frame consisting of numeric and factor variables as seen below.

testFrame <- data.frame(First=sample(1:10, 20, replace=T),
           Second=sample(1:20, 20, replace=T), Third=sample(1:10, 20, replace=T),
           Fourth=rep(c("Alice","Bob","Charlie","David"), 5),
           Fifth=rep(c("Edward","Frank","Georgia","Hank","Isaac"),4))

I want to build out a matrix that assigns dummy variables to the factor and leaves the numeric variables alone.

model.matrix(~ First + Second + Third + Fourth + Fifth, data=testFrame)

As expected when running lm this leaves out one level of each factor as the reference level. However, I want to build out a matrix with a dummy/indicator variable for every level of all the factors. I am building this matrix for glmnet so I am not worried about multicollinearity.

Is there a way to have model.matrix create the dummy for every level of the factor?


Solution

  • You need to reset the contrasts for the factor variables:

    model.matrix(~ Fourth + Fifth, data=testFrame, 
            contrasts.arg=list(Fourth=contrasts(testFrame$Fourth, contrasts=F), 
                    Fifth=contrasts(testFrame$Fifth, contrasts=F)))
    

    or, with a little less typing and without the proper names:

    model.matrix(~ Fourth + Fifth, data=testFrame, 
        contrasts.arg=list(Fourth=diag(nlevels(testFrame$Fourth)), 
                Fifth=diag(nlevels(testFrame$Fifth))))