Is it possible to use model matrix in a regression without having the name of the model matrix in the regression results?
I need to go through such a procedure as I have some interactions for which I have no observation. (i.e) the results of the interaction is NA
.
A related question can be found here.
Here are some data to illustrate my point:
mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
str(mydata)
gre_ <- mydata$gre-mean(mydata$gre)
a <- model.matrix(~-1+gre_:factor(rank),data=mydata)[,-c(2)]
summary(glm(admit~gpa+gre+factor(rank)+a,data=mydata, family=binomial))
results
Call:
glm(formula = admit ~ gpa + gre + rank + a, family = binomial,
data = mydata)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.6449 -0.8886 -0.6332 1.1706 2.1949
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.0039781 1.4012928 -2.144 0.0321 *
gpa 0.7634679 0.3297215 2.315 0.0206 *
gre 0.0016098 0.0016634 0.968 0.3332
rank -0.5584921 0.1288588 -4.334 1.46e-05 ***
agre_:factor(rank)1 0.0014010 0.0028001 0.500 0.6168
agre_:factor(rank)3 0.0010074 0.0025007 0.403 0.6871
agre_:factor(rank)4 0.0009936 0.0034111 0.291 0.7708
---
How do we get rid of the model.matrix
name a in the results?
If you run with this formula syntax, R is going to put the "a" there. You can extract the names of coefficients and remove the first "a" if you like with gsub()
or use substr()
to remove the first letter. It depends on how you want to process them down the line.
The other option is to use glm.fit
and specify the complete model matrix yourself. Something like
a <- model.matrix(~-1+gre_:factor(rank),data=mydata)[,-c(2)]
b <- model.matrix(~gpa+gre+rank, data=mydata)
mm<-cbind(b,a)
ff<-glm.fit(mm,mydata$admit, family=binomial())
class(ff)<-c("glm","lm")
summary(ff)
will return
Call:
NULL
Deviance Residuals:
Min 1Q Median 3Q Max
-1.6449 -0.8886 -0.6332 1.1706 2.1949
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.0039781 1.4012928 -2.144 0.0321 *
gpa 0.7634679 0.3297215 2.315 0.0206 *
gre 0.0016098 0.0016634 0.968 0.3332
rank -0.5584921 0.1288588 -4.334 1.46e-05 ***
gre_:factor(rank)1 0.0014010 0.0028001 0.500 0.6168
gre_:factor(rank)3 0.0010074 0.0025007 0.403 0.6871
gre_:factor(rank)4 0.0009936 0.0034111 0.291 0.7708
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 499.98 on 399 degrees of freedom
Residual deviance: 459.13 on 393 degrees of freedom
AIC: 473.13
Number of Fisher Scoring iterations: 4
Here your estimates are the same and your variable names are untouched. Since its not technically a real glm
object we are fibbing a bit by adding the class information, but it does have nearly all the same properties (you can see the "call" is missing) and in most cases should behave like a regular glm
object with most functions, including summary()
.