So I have a statistical model in which I am using factor-by-curves, i.e., fitting separate smoothing curves for some categories, as in the following. (I'm not paying much attention to the data/model's meaning, just using it as a minimal example.)
library(dplyr)
library(qgam)
library(mgcv)
data(UKload)
test <- gam(
NetDemand ~ te(wM, Posan, by = Year),
data = UKload %>% mutate(Year = as.factor(Year))
)
When I was simply doing smoothing curves by s
instead of tensors, I was happy to use visreg
package as follows:
library(visreg)
test2 <- gam(
NetDemand ~ s(wM, by = Year),
data = UKload %>% mutate(Year = as.factor(Year))
)
visreg(test2, xvar = "wM", by = "Year")
However, I don't seem to be able to do something similar when I am including a tensor---it will simply draw a contour plot with full data, instead of having it partitioned by the factor variable of interest:
visreg2d(test, xvar = "wM", yvar = "Posan", by = "Year")
Warning message: In title(...) : "by" is not a graphical parameter
I can do the mgcv::vis.gam
with a condition:
vis.gam(test, plot.type = "contour", cond = list(Year = 2011))
and later aggregate the plots by Rmisc::multiplot
or base plot
, but I'm not too happy with these solutions, both in terms of aesthetics and the workflow. Any handy tips on making pretty plots for tensor product smooths with factor-by-curves?
Depends what you mean by prettier? ;-)
My gratia package will produce plots of factor-by smooths. For example
draw(test, ncol = 2)
produces
The grey parts of the surfaces are where one would be extrapolating too far from the available data. How far is "too far" is controlled by the dist
argument, which by default is set to mark any point on the grid as NA
if it is more than 10% (dist = 0.1
) of the range of the data away from the nearest data point.
I haven't got round to allowing these surfaces to be plotted on the same scale and to having a common colourbar legend, but gratia is very much a work in progress.
If you want to do the plotting yourself, then gratia can also produce a tidy-like object (a tibble, with data arranged in a form suitable for plotting with ggplot2) via the evaluate_smooth()
function
> es <- evaluate_smooth(test, smooth = 'te(wM,Posan)')
> es
# A tibble: 60,000 x 7
smooth by_variable wM Posan est se Year
<chr> <fct> <dbl> <dbl> <dbl> <dbl> <fct>
1 te(wM,Posan):Year2011 Year -1.43 0.00137 7556. 1516. 2011
2 te(wM,Posan):Year2011 Year -1.11 0.00137 7506. 1466. 2011
3 te(wM,Posan):Year2011 Year -0.789 0.00137 7456. 1417. 2011
4 te(wM,Posan):Year2011 Year -0.470 0.00137 7405. 1368. 2011
5 te(wM,Posan):Year2011 Year -0.150 0.00137 7355. 1319. 2011
6 te(wM,Posan):Year2011 Year 0.169 0.00137 7305. 1271. 2011
7 te(wM,Posan):Year2011 Year 0.489 0.00137 7255. 1224. 2011
8 te(wM,Posan):Year2011 Year 0.808 0.00137 7205. 1178. 2011
9 te(wM,Posan):Year2011 Year 1.13 0.00137 7154. 1132. 2011
10 te(wM,Posan):Year2011 Year 1.45 0.00137 7104. 1087. 2011
# … with 59,990 more rows
Here you can see that there are variables coding the particular smooth, an indication of what the by
variable is, and all the data columns associate with the surfaces shown above. Here wM
and Posan
are evaluated on a 100x100 grid of points over the range of the data before evaluating the smooth at those combinations of covariates.