Search code examples
routliers

DFFITs for Beta Regression


I am trying to calculate DFFITS for GLM, where responses follow a Beta distribution. By using betareg R package. But I think this package doesn't support influence.measures() because by using dffits() Code

require(betareg)
df<-data("ReadingSkills")
y<-ReadingSkills$accuracy
n<-length(y)

bfit<-betareg(accuracy ~ dyslexia + iq, data = ReadingSkills)
DFFITS<-dffits(bfit, infl=influence(bfit, do.coef = FALSE))
DFFITS

it yield

Error in if (model$rank == 0) { : argument is of length zero

I am a newbie in R. I don't know how to resolve this problem. Kindly help to solve this or give me some tips through R code that how to calculate DFFITs manually. Regards


Solution

  • dffits are not implemented for "betareg" objects, but you could try to calculate them manually.

    According to this Stack Overflow Q/A we could write this function:

    dffits1 <- function(x1, bres.type="response") {
      stopifnot(class(x1) %in% c("lm", "betareg"))
      sapply(1:length(x1$fitted.values), function(i) {
        x2 <- update(x1, data=x1$model[-i, ]) # leave one out
        h <- hatvalues(x1)
        nm <- rownames(x1$model[i, ])
        num_dffits <- suppressWarnings(predict(x1, x1$model[i, ]) - 
                                         predict(x2, x1$model[i, ]))
        residx <- if (class(x1) == "betareg") {
          betareg:::residuals.betareg(x2, type=bres.type)
        } else {
          x2$residuals
        }
        denom_dffits <- sqrt(c(crossprod(residx)) / x2$df.residual*h[i])
        return(num_dffits / denom_dffits)
      })
    }
    

    It works well for lm:

    fit <- lm(mpg ~ hp, mtcars)
    dffits1(fit)
    stopifnot(all.equal(dffits1(fit), dffits(fit)))
    

    Now let's try betareg:

    library(betareg)
    data("ReadingSkills")
    
    bfit <- betareg(accuracy ~ dyslexia + iq, data=ReadingSkills)
    dffits1(bfit)
    #           1           2           3           4           5           6           7 
    # -0.07590185 -0.21862047 -0.03620530  0.07349169 -0.11344968 -0.39255172 -0.25739032 
    #           8           9          10          11          12          13          14 
    #  0.33722706  0.16606198  0.10427684  0.11949807  0.09932991  0.11545263  0.09889406 
    #          15          16          17          18          19          20          21 
    #  0.21732090  0.11545263 -0.34296030  0.09850239 -0.36810187  0.09824013  0.01513643 
    #          22          23          24          25          26          27          28 
    #  0.18635669 -0.31192106 -0.39038732  0.09862045 -0.10859676  0.04362528 -0.28811277 
    #          29          30          31          32          33          34          35 
    #  0.07951977  0.02734462 -0.08419156 -0.38471945 -0.43879762  0.28583882 -0.12650591 
    #          36          37          38          39          40          41          42 
    # -0.12072976 -0.01701615  0.38653773 -0.06440176  0.15768684  0.05629040  0.12134228 
    #          43          44 
    #  0.13347935  0.19670715 
    

    Looks not bad.

    Notes:

    • Even if this works in code, you should check if it meets your statistical requirements!
    • I've used suppressWarnings in lines 5:6 of dffits1. predict(bfit, ReadingSkills) drops the contrasts somehow, whereas predict(bfit) does not (should practically be the same). However the results are identical: all.equal(predict(bfit, ReadingSkills), predict(bfit)), thus ignoring the warnings be safe.