Search code examples
rglmrevoscaler

revoScaleR::rxGlm() Question in R - GLM Residuals


I might not find an answer here because I don't think the revoScaleR package is widely used.

If I create a GLM using rxGlm() it works fine. However the model residuals available via rxPredict() seem to just be the "raw" residuals, ie observed value minus fitted value. The various transformed versions (deviance residuals, pearson residuals, etc.) don't seem to be available.

Does anyone know if there's a way to achieve this? I can get deviance residuals (for example) for the model by running it again using glm() (with the same formula, data, error structure, link function, weights) and using residuals(glm_object, type = "deviance"), but this is a nuisance because glm() runs very slowly (large dataset, many model parameters).

Thanks.

Edited: to include this guidance from the literature which I'm trying to follow:

enter image description here


Solution

  • It is a bit hard to fully understand from your question what the RevoScaleR package provides in terms of residuals and which residuals exactly you need. In addition, there is quite some confusion regarding the terminology of residuals, as this exemplified for example here and here.

    A few points/observations which might help you nonetheless.

    In linear regression, "raw" are identical to "deviance" residuals

    At least that what I take from running toy regressions with glm and predicting outcomes like:

    df <- mtcars
    modl <- glm(formula = mpg ~ wt + qsec + am, data = mtcars)
    y_hat <- predict(modl)
    

    Next, calculate the "raw" residuals (predicted outcome minus actual outcome) as well as deviance residuals:

    y <- as.vector(df[["mpg"]])
    res_raw <- y - y_hat
    res_dev <- residuals(modl, type = "deviance")
    

    These two are identical:

    identical(res_raw, res_dev)
    [1] TRUE
    

    I guess it's more complicated once you get into binary outcomes etc.

    Formula for computing standardized deviance residuals

    Standardized deviance residuals are computed from glm with the rstandard method.

    res_std <- rstandard(modl)
    

    Looking at getAnywhere(rstandard.glm) tells you how the standardized residuals can be computed by hand from the deviance residuals:

    function (model, infl = influence(model, do.coef = FALSE), type = c("deviance", 
        "pearson"), ...) 
    {
        type <- match.arg(type)
        res <- switch(type, pearson = infl$pear.res, infl$dev.res)
        res <- res/sqrt(summary(model)$dispersion * (1 - infl$hat)) # this is the key line
        res[is.infinite(res)] <- NaN
        res
    }
    

    So in my example, you would manually compute standardized residuals by running res/sqrt(summary(modl)$dispersion * (1 - influence(modl)$hat)). So you need two things: hat and dispersion. I assume that RevoScaleR provides the dispersion parameter. If there is nothing in RevoScaleR like influence(modl)$hat to get the hat values, you'll have to do it from scratch:

    X <- as.matrix(df[, c("wt", "qsec", "am")]) # Gets the X variables
    X <- cbind(rep(1, nrow(df)), X) # adds column for the constant
    hat <- diag(X %*% solve(t(X) %*% X) %*% t(X)) # formula for hat values
    

    Now compute your standardized deviance residuals:

    res_man <- res_raw/sqrt(summary(modl)$dispersion * (1 - hat))
    

    Which are the same as derived with rstandard:

    head(res_man)
            Mazda RX4     Mazda RX4 Wag        Datsun 710    Hornet 4 Drive Hornet Sportabout           Valiant 
           -0.6254171        -0.4941877        -1.4885771         0.2297471         0.7217423        -1.1790097 
    head(res_std)
            Mazda RX4     Mazda RX4 Wag        Datsun 710    Hornet 4 Drive Hornet Sportabout           Valiant 
           -0.6254171        -0.4941877        -1.4885771         0.2297471         0.7217423        -1.1790097