Search code examples
rggplot2mgcv

Problems with "gam" smoothing in ggplot2


I am trying to use GAM smoothing in ggplot2. According to this conversation and this code, ggplot2 loads mgcv package used for general additive models only if n >= 1000. Otherwise a user has to manually load the package. As far as I understand this example code from the conversation should do the smoothing using geom_smooth(method="gam", formula = y ~ s(x, bs = "cs")):

library(ggplot2)
dat.large <- data.frame(x=rnorm(10000), y=rnorm(10000))
ggplot(dat.large, aes(x=x, y=y)) + geom_smooth() 

But I get an error:

geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
Error in s(x, bs = "cs") : object 'x' not found

The same error happens if I try following:

ggplot(dat.large, aes(x=x, y=y)) + geom_point() + geom_smooth(method="gam", formula = y ~ s(x, bs = "cs"))

But for example linear model would work:

ggplot(dat.large, aes(x=x, y=y)) + geom_smooth(method = "lm", formula = y ~ x)

What am I doing wrong here?

My R and package versions should be up-to-date:

R version 3.0.3 (2014-03-06)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

other attached packages: mgcv_1.7-29  ggplot2_0.9.3.1 

Solution

  • The problem was that I had summary function assigned as s in my .Rprofile. This confused the s() argument in gam function. I guess one should avoid assigning too many shorthands. After removal of that assignment everything works as it should.

    One way to avoid making packages confused by .Rprofile shorthands is to assign them to a hidden environment and attach that environment in .Rprofile. For example (the code is borrowed from here):

    .env <- new.env()
    .env$s <- base::summary
    attach(.env)
    

    Then s would work as summary until loading mgcv

    dat.large <- data.frame(x=rnorm(10000), y=rnorm(10000))
    s(dat.large)
           x                   y            
     Min.   :-3.823756   Min.   :-4.531882  
     1st Qu.:-0.683730   1st Qu.:-0.687335  
     Median :-0.006945   Median :-0.009993  
     Mean   :-0.010285   Mean   :-0.000491  
     3rd Qu.: 0.665435   3rd Qu.: 0.672098  
     Max.   : 3.694357   Max.   : 3.647825  
    

    And would change meaning after loading the package, but would not confuse the package functionality:

    ggplot(dat.large, aes(x=x, y=y)) + geom_smooth() # works
    s(dat.large)
    $term
    [1] "dat.large"
    
    $bs.dim
    [1] -1
    
    $fixed
    [1] FALSE
    
    $dim
    [1] 1
    
    $p.order
    [1] NA
    
    $by
    [1] "NA"
    
    $label
    [1] "s(dat.large)"
    
    $xt
    NULL
    
    $id
    NULL
    
    $sp
    NULL
    
    attr(,"class")
    [1] "tp.smooth.spec"
    

    EDIT Workaround above did not seem to work in my actual code, which is much more complicated. If you want to keep that summary shorthand, the easiest workaround is just to place rm(s) before loading mgcv.