Search code examples
rregressionpredictgamgamlss

r gamlss: prediction of z-scores using regression with multiple explanatory variables


The function centiles.pred is a great option to extract z-scores based on a gamlss model like in the following code:

library(gamlss)
FIT = gamlss(mpg ~ disp, data = mtcars, family = BCPE)

NEWDATA = data.frame(disp = 300, mpg  = 17)
centiles.pred(FIT, xvalues = NEWDATA$disp, xname = "disp", yval = NEWDATA$mpg, type = "z-scores")

However, the help-page of centiles.pred says "A restriction of the function is that it applies to models with only one explanatory variable". In many cases, however, you have more than one explanatory variable as in the following example:

FIT = gamlss(mpg ~ disp + qsec, data = mtcars, family = BCPE)

My question is: Is there a workable way to calculate z-scores and centiles (also according to the arguments family = "standard-centiles", and family = "centiles" in the function centiled.pred) from a gamlss model with more than one explanatory variable?


Solution

  • The function predictAll(FIT,newdata= ) gives the fitted parameters (mu, sigma, nu, tau) for new values of display and qsec. [See Stasinopoulos et al. (2017) page 143.]

    Then with the fitted (mu, sigma, nu, tau) use the cdf of BCPE (i.e. pBCPE) to find the probability (say p) of being below your corresponding new value of mpg.

    The z-score is then given by qNO(p)

    NOTE: example code

    NEWDATA = data.frame(disp = 300, qsec = 50)

    params <- predictAll(FIT,newdata=NEWDATA)

    For mpg = 17, then

    p <- pBCPE(17, params$mu, params$sigma, params$nu, params$tau)

    The z-score is then given by

    z <- qNO(p)

    qNo is the inverse cdf of the standard normal distribution.