Search code examples
juliaglm

Julia GLM - using devresid for plotting


I would like to do some residual analysis for a GLM.

My model is in the form

using GLM
model = glm(@formula(y ~ x), data, Binomial(), LogitLink())

My text book suggests that residual analysis in the GLM be performed using deviance residuals. I was glad to see that Julia's GLM has a devresid() function, and that it suggests how to use it for plotting (sign(y - μ) * sqrt(devresid(D, y, μ))). However, I'm at a total loss as to what the input arguments are supposed to be. Looking at the doc-string:

?devresid
devresid(D, y, μ::Real)

Return the squared deviance residual of μ from y for distribution D

The deviance of a GLM can be evaluated as the sum of the squared deviance residuals. This is the principal use for these values. The actual deviance residual, say for plotting, is the signed square root of this value

sign(y - μ) * sqrt(devresid(D, y, μ))

Examples

julia> devresid(Normal(), 0, 0.25) ≈ abs2(0.25)
true

julia> devresid(Bernoulli(), 1, 0.75) ≈ -2*log(0.75)
true

julia> devresid(Bernoulli(), 0, 0.25) ≈ -2*log1p(-0.25)
true
  • D: I'm guessing that in my case it is Binomial()
  • y: I'm guessing this is the indicator variable for a single case, i.e. 1 or 0
  • μ: What is this?

How can I use this function to produce things like plots of the deviance residual on a normal probability scale and versus fitted values?

Here's the data I'm using in CSV form

x,y
400,0
220,1
490,0
210,1
500,0
270,0
200,1
470,0
480,0
310,1
240,1
490,0
420,0
330,1
280,1
210,1
300,1
470,1
230,0
430,0
460,0
220,1
250,1
200,1
390,0

Solution

  • I understand this is what you want:

    julia> data = DataFrame(X=[1,1,1,2,2], Y=[1,1,0,0,1])
    5×2 DataFrame
    │ Row │ X     │ Y     │
    │     │ Int64 │ Int64 │
    ├─────┼───────┼───────┤
    │ 1   │ 1     │ 1     │
    │ 2   │ 1     │ 1     │
    │ 3   │ 1     │ 0     │
    │ 4   │ 2     │ 0     │
    │ 5   │ 2     │ 1     │
    
    julia> model = glm(@formula(Y ~ X), data, Binomial(), LogitLink())
    StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Binomial{Float64},LogitLink},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
    
    Y ~ 1 + X
    
    Coefficients:
    ─────────────────────────────────────────────────────────────────────────────
                  Estimate  Std. Error    z value  Pr(>|z|)  Lower 95%  Upper 95%
    ─────────────────────────────────────────────────────────────────────────────
    (Intercept)   1.38629      2.82752   0.490286    0.6239   -4.15554    6.92813
    X            -0.693146     1.87049  -0.37057     0.7110   -4.35923    2.97294
    ─────────────────────────────────────────────────────────────────────────────
    
    julia> p = predict(model)
    5-element Array{Float64,1}:
     0.6666664218508201
     0.6666664218508201
     0.6666664218508201
     0.5
     0.5
    
    julia> y = data.Y
    5-element Array{Int64,1}:
     1
     1
     0
     0
     1
    
    julia> @. sign(y - p) * sqrt(devresid(Bernoulli(), y, p))
    5-element Array{Float64,1}:
      0.9005170462928523
      0.9005170462928523
     -1.4823033118905455
     -1.1774100225154747
      1.1774100225154747
    

    (this is what you would get from calling residuals(model, type="deviance") in R)

    Note that in the last line I use @. to vectorize the whole line. Alternatively you could have written it as:

    julia> sign.(y .- p) .* sqrt.(devresid.(Bernoulli(), y, p))
    5-element Array{Float64,1}:
      0.9005170462928523
      0.9005170462928523
     -1.4823033118905455
     -1.1774100225154747
      1.1774100225154747