Search code examples
rstatisticsextractplyrgraph-modelling-language

Extract the deviance from a list of GLM models using ldply()


I want to extract the deviance from a list of GLM models using ldply()

Example data (from R base installation):

    library(reshape2)
    library(plyr)
    mtcars.1 <- mtcars[, c("am",  "qsec" , "drat")  ]
    mtcars.m <- melt(mtcars.1, id= 1 ) 

    glm.cars <- dlply( mtcars.m ,  .(variable) ,  
    glm,  formula=  am ~ value , family=binomial )  

Got this far:

    ldply(  glm.cars  ,  summarise ,   "Null Deviance" = null.deviance , 
        "Residual Deviance" = deviance , "Deviance"= "??"    )

Which gives this:

      variable  Null Deviance     Residual Deviance    Deviance
1     qsec      43.22973          41.46512             ??
2     drat      43.22973          21.65003             ??

Deviance is missing! How do I extract it?

So how do I extract the deviance in the example above?

Off course i could just do null.deviance + deviance , but I just dont feel like doing it that way. I guess the reason I that I want to get better acquainted with the G statistic. I feel I go through the steops of extracting, subtracting and doing the chisqr, I will learn it better.

PS I was very confused to find that glm.model$devianc


Solution

  • As you say, you are confused. For each model you have two deviances. It's the difference in those two deviances (... not their sum) that is the interesting statistical measure. (I'm guessing you were making an analogy to the additive nature of residual sums of squares and model sums of squares, but if so, then you followed the wrong rabbit down the linguistic analogy hole.) You need to compare the difference to a 95% chi-square value with the same degrees of freedom as the difference in degrees of freedom between the null model and the "residual model". If you do str(.) on the models you can scroll down the list output to find that ther names are:

     str(glm(am~qsec, mtcars, family=binomial)  )
     .....
     $ deviance         : num 41.5
     $ aic              : num 45.5
     $ null.deviance    : num 43.2
     .....
     $ df.residual      : int 30
     $ df.null          : int 31
     .....
    

    So your dlply code needs to extract those and and then you calculate null.deviance-deviance and df.null -df.residual and perhaps display qchisq(0.95, df.null-df.residual). If you want to see how it got packaged by the R-Core then look at:

     anova( glm(am~qsec, mtcars, family=binomial)  )