Search code examples
rinteractionmodel.matrix

prevent the name of model.matrix to appear in the results on a regression


Is it possible to use model matrix in a regression without having the name of the model matrix in the regression results?

I need to go through such a procedure as I have some interactions for which I have no observation. (i.e) the results of the interaction is NA.

A related question can be found here.

Here are some data to illustrate my point:

mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
str(mydata)

gre_ <- mydata$gre-mean(mydata$gre)

a <- model.matrix(~-1+gre_:factor(rank),data=mydata)[,-c(2)]

summary(glm(admit~gpa+gre+factor(rank)+a,data=mydata, family=binomial))

results

Call:
glm(formula = admit ~ gpa + gre + rank + a, family = binomial, 
data = mydata)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.6449  -0.8886  -0.6332   1.1706   2.1949  

Coefficients:
                      Estimate Std. Error z value Pr(>|z|)    
(Intercept)         -3.0039781  1.4012928  -2.144   0.0321 *  
gpa                  0.7634679  0.3297215   2.315   0.0206 *  
gre                  0.0016098  0.0016634   0.968   0.3332    
rank                -0.5584921  0.1288588  -4.334 1.46e-05 ***
agre_:factor(rank)1  0.0014010  0.0028001   0.500   0.6168    
agre_:factor(rank)3  0.0010074  0.0025007   0.403   0.6871    
agre_:factor(rank)4  0.0009936  0.0034111   0.291   0.7708    
---

How do we get rid of the model.matrix name a in the results?


Solution

  • If you run with this formula syntax, R is going to put the "a" there. You can extract the names of coefficients and remove the first "a" if you like with gsub() or use substr() to remove the first letter. It depends on how you want to process them down the line.

    The other option is to use glm.fit and specify the complete model matrix yourself. Something like

    a <- model.matrix(~-1+gre_:factor(rank),data=mydata)[,-c(2)]
    b <- model.matrix(~gpa+gre+rank, data=mydata)
    mm<-cbind(b,a)
    
    ff<-glm.fit(mm,mydata$admit, family=binomial())
    class(ff)<-c("glm","lm")
    summary(ff)
    

    will return

    Call:
    NULL
    
    Deviance Residuals: 
        Min       1Q   Median       3Q      Max  
    -1.6449  -0.8886  -0.6332   1.1706   2.1949  
    
    Coefficients:
                         Estimate Std. Error z value Pr(>|z|)    
    (Intercept)        -3.0039781  1.4012928  -2.144   0.0321 *  
    gpa                 0.7634679  0.3297215   2.315   0.0206 *  
    gre                 0.0016098  0.0016634   0.968   0.3332    
    rank               -0.5584921  0.1288588  -4.334 1.46e-05 ***
    gre_:factor(rank)1  0.0014010  0.0028001   0.500   0.6168    
    gre_:factor(rank)3  0.0010074  0.0025007   0.403   0.6871    
    gre_:factor(rank)4  0.0009936  0.0034111   0.291   0.7708    
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    (Dispersion parameter for binomial family taken to be 1)
    
        Null deviance: 499.98  on 399  degrees of freedom
    Residual deviance: 459.13  on 393  degrees of freedom
    AIC: 473.13
    
    Number of Fisher Scoring iterations: 4
    

    Here your estimates are the same and your variable names are untouched. Since its not technically a real glm object we are fibbing a bit by adding the class information, but it does have nearly all the same properties (you can see the "call" is missing) and in most cases should behave like a regular glm object with most functions, including summary().