I am trying to use GAM smoothing in ggplot2. According to this conversation and this code, ggplot2 loads mgcv package used for general additive models only if n >= 1000. Otherwise a user has to manually load the package. As far as I understand this example code from the conversation should do the smoothing using geom_smooth(method="gam", formula = y ~ s(x, bs = "cs"))
:
library(ggplot2)
dat.large <- data.frame(x=rnorm(10000), y=rnorm(10000))
ggplot(dat.large, aes(x=x, y=y)) + geom_smooth()
But I get an error:
geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
Error in s(x, bs = "cs") : object 'x' not found
The same error happens if I try following:
ggplot(dat.large, aes(x=x, y=y)) + geom_point() + geom_smooth(method="gam", formula = y ~ s(x, bs = "cs"))
But for example linear model would work:
ggplot(dat.large, aes(x=x, y=y)) + geom_smooth(method = "lm", formula = y ~ x)
What am I doing wrong here?
My R and package versions should be up-to-date:
R version 3.0.3 (2014-03-06)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
other attached packages: mgcv_1.7-29 ggplot2_0.9.3.1
The problem was that I had summary
function assigned as s
in my .Rprofile
. This confused the s()
argument in gam
function. I guess one should avoid assigning too many shorthands. After removal of that assignment everything works as it should.
One way to avoid making packages confused by .Rprofile shorthands is to assign them to a hidden environment and attach that environment in .Rprofile. For example (the code is borrowed from here):
.env <- new.env()
.env$s <- base::summary
attach(.env)
Then s
would work as summary
until loading mgcv
dat.large <- data.frame(x=rnorm(10000), y=rnorm(10000))
s(dat.large)
x y
Min. :-3.823756 Min. :-4.531882
1st Qu.:-0.683730 1st Qu.:-0.687335
Median :-0.006945 Median :-0.009993
Mean :-0.010285 Mean :-0.000491
3rd Qu.: 0.665435 3rd Qu.: 0.672098
Max. : 3.694357 Max. : 3.647825
And would change meaning after loading the package, but would not confuse the package functionality:
ggplot(dat.large, aes(x=x, y=y)) + geom_smooth() # works
s(dat.large)
$term
[1] "dat.large"
$bs.dim
[1] -1
$fixed
[1] FALSE
$dim
[1] 1
$p.order
[1] NA
$by
[1] "NA"
$label
[1] "s(dat.large)"
$xt
NULL
$id
NULL
$sp
NULL
attr(,"class")
[1] "tp.smooth.spec"
EDIT Workaround above did not seem to work in my actual code, which is much more complicated. If you want to keep that summary
shorthand, the easiest workaround is just to place rm(s)
before loading mgcv.