Search code examples
rregressionquantilegamqgam

Why does mgcviz not use the proper scatterplot for fitted QGAMs?


As the title suggests, the normal scatterplot, along with the default GAM fitting for the data, should look like this:

#### Libraries ####
library(tidyverse)
library(mgcViz)
library(qgam)
library(quantreg)

#### Save Data as Tibble ####
data("barro")
tib <- as_tibble(MASS::mcycle)
tib

#### Inspect Scatterplot ####
tib %>% 
  ggplot(aes(x=times,
             y=accel))+
  geom_point(alpha=.4)+
  theme_classic()+
  geom_smooth(method = "gam",
              formula = y ~ s(x))+
  labs(x="Times",
       y="Acceleration",
       title = "Normal Fitted GAM")

enter image description here

Fitting the QGAM is straightforward from here:

#### Set Quants and Fit Multiple Quantile GAM ####
q <- c(.2,.5,.8)

fit <- mqgam(accel ~ s(times),
             data = tib, 
             q=q)

However, plotting each quantile plot shows that the plotting mechanism for QGAMs in the mgcViz package doesn't match the data points that should exist. Here I select just the .20 quantile fit to show what is happening and use mostly the same code as shown here.

#### Save QDO Objects ####
q2 <- qdo(fit,.2)

#### Grab Visual Data ####
pg2 <- getViz(q2)

#### Plot in MGCVIZ ####
final.plot <- plot(pg2, 
                   select = 1)
final.plot +
  l_fitLine(colour = "red") +
  l_rug(mapping = aes(x=x, 
                      y=y), 
        alpha = 0.8) +
  l_ciLine(mul = 5, 
           colour = "blue", 
           linetype = 2) + 
  l_points(shape = 19, 
           size = 1, 
           alpha = 0.1) + 
  theme_classic()

Here you can see that some values on the y-axis do not match where they should be on the original plot, circled here in green:

enter image description here

How can I fix this issue?


Solution

  • The points in the plots are partial residuals, not data. Residuals are defined with respect to a model; change the model and the partial residuals will change. As you are fitting a lower tail quantile, I'd expect some observations in the extreme of the opposite tail to be poorly fitted and hence have larger partial residual.

    Note that the large partial residuals you circle in green (one circle doesn't contain a residual point, so not sure what's going on there?) are where the data has high variance and hence the data are more dispersed. As such I would expect larger partial residuals here as there are more extreme observations with respect to the opposite tail of the distribution that you are modelling.