Search code examples
rgammgcv

Plotting factor-by-curve tensor product smooths of mgcv::gam


So I have a statistical model in which I am using factor-by-curves, i.e., fitting separate smoothing curves for some categories, as in the following. (I'm not paying much attention to the data/model's meaning, just using it as a minimal example.)

library(dplyr)
library(qgam)
library(mgcv)
data(UKload)
test <- gam(
  NetDemand ~ te(wM, Posan, by = Year), 
  data = UKload %>% mutate(Year = as.factor(Year))
)

When I was simply doing smoothing curves by s instead of tensors, I was happy to use visreg package as follows:

library(visreg)
test2 <- gam(
  NetDemand ~ s(wM, by = Year),
  data = UKload %>% mutate(Year = as.factor(Year))
)
visreg(test2, xvar = "wM", by = "Year")

enter image description here

However, I don't seem to be able to do something similar when I am including a tensor---it will simply draw a contour plot with full data, instead of having it partitioned by the factor variable of interest:

visreg2d(test, xvar = "wM", yvar = "Posan", by = "Year")

Warning message: In title(...) : "by" is not a graphical parameter

enter image description here

I can do the mgcv::vis.gam with a condition:

vis.gam(test, plot.type = "contour", cond = list(Year = 2011))

and later aggregate the plots by Rmisc::multiplot or base plot, but I'm not too happy with these solutions, both in terms of aesthetics and the workflow. Any handy tips on making pretty plots for tensor product smooths with factor-by-curves?


Solution

  • Depends what you mean by prettier? ;-)

    My gratia package will produce plots of factor-by smooths. For example

    draw(test, ncol = 2)
    

    produces

    enter image description here

    The grey parts of the surfaces are where one would be extrapolating too far from the available data. How far is "too far" is controlled by the dist argument, which by default is set to mark any point on the grid as NA if it is more than 10% (dist = 0.1) of the range of the data away from the nearest data point.

    I haven't got round to allowing these surfaces to be plotted on the same scale and to having a common colourbar legend, but gratia is very much a work in progress.

    If you want to do the plotting yourself, then gratia can also produce a tidy-like object (a tibble, with data arranged in a form suitable for plotting with ggplot2) via the evaluate_smooth() function

    > es <- evaluate_smooth(test, smooth = 'te(wM,Posan)')
    > es
    # A tibble: 60,000 x 7
       smooth                by_variable     wM   Posan   est    se Year 
       <chr>                 <fct>        <dbl>   <dbl> <dbl> <dbl> <fct>
     1 te(wM,Posan):Year2011 Year        -1.43  0.00137 7556. 1516. 2011 
     2 te(wM,Posan):Year2011 Year        -1.11  0.00137 7506. 1466. 2011 
     3 te(wM,Posan):Year2011 Year        -0.789 0.00137 7456. 1417. 2011 
     4 te(wM,Posan):Year2011 Year        -0.470 0.00137 7405. 1368. 2011 
     5 te(wM,Posan):Year2011 Year        -0.150 0.00137 7355. 1319. 2011 
     6 te(wM,Posan):Year2011 Year         0.169 0.00137 7305. 1271. 2011 
     7 te(wM,Posan):Year2011 Year         0.489 0.00137 7255. 1224. 2011 
     8 te(wM,Posan):Year2011 Year         0.808 0.00137 7205. 1178. 2011 
     9 te(wM,Posan):Year2011 Year         1.13  0.00137 7154. 1132. 2011 
    10 te(wM,Posan):Year2011 Year         1.45  0.00137 7104. 1087. 2011 
    # … with 59,990 more rows
    

    Here you can see that there are variables coding the particular smooth, an indication of what the by variable is, and all the data columns associate with the surfaces shown above. Here wM and Posan are evaluated on a 100x100 grid of points over the range of the data before evaluating the smooth at those combinations of covariates.