Search code examples
rggplot2gam

R GAM visualisation, geom_smooth not fit to all observed data


I've made a GAM model in R using the following code:

mod_gam1 <-gam(y ~ s(ï..x), data=Bird.data, method = "REML")
plot(mod_gam1)
coef(mod_gam1)
plot(mod_gam1, residuals = TRUE, pch = 1)
coef(mod_gam1)

mod_gam1$fitted.values

result <- data.frame(data = c(mod_gam1$fitted.values, Bird.data$y), Year = rep(1991:2019, times = 2), 
                     'source' = c(rep('Modelled', times = 29), rep('Observed', times = 29)))
ggplot(result, aes(x = Year, y = data, colour = source))+ geom_point()+ geom_smooth(span= 0.8)+labs(x="Year", y = "Bird Island Total Debris Count")+ scale_y_continuous(limits = c(0,1000))

and the output looks ok but the shaded area of the geom_smooth error doesn't extend to the whole of my dataset (stops short of my first two datapoints) and I am not sure why.

Any help would be appreciated!

I can't upload a picture as I am new to the site, but yeah basically I have two datasets (observed and GAM modelled values) which both have their SE confidence ribbon, but these start two datapoints in to my datasets not at the first points.

These are my datapoints: Bird.data

ï..x y
1991 17
1992 76
1993 328
1994 131
1995 425
1996 892
1997 501
1998 419
1999 297
2000 277
2001 310
2002 282
2003 189
2004 278
2005 322
2006 444
2007 412
2008 241
2009 242
2010 255
2011 289
2012 335
2013 279
2014 628
2015 500
2016 174
2017 636
2018 420
2019 447

Fitted Values

 [1]  95.56189 177.01468 255.17074 324.97532 380.28813 415.71334 428.67793 420.86624 398.18522 369.06325
[11] 341.72715 321.65585 310.33971 305.81158 304.53360 303.60521 302.21413 301.75501 304.77184 313.43400
[21] 328.37279 348.39076 371.04203 393.66222 414.29754 432.15104 447.48020 461.14595 474.09266

Negative Binomial

enter image description here


Solution

  • It is because of the limits you have put using scale_y_continuous. If you remove that line (or adjust the y down, so that it allows the minimum y value of the smooth, then you will see the smooth fill completely.

    However, you have a larger problem here. You are not actually showing the gam model in the smooth (only the gam point predictions). There are a couple of ways to do this.. Easiest might be to feed Bird.data directly to the ggplot function, and use the method and formula params of the geom_smooth() to directly request the gam smooth:

    ggplot(Bird.data, aes(x,y)) + 
      geom_point() + 
      geom_smooth(method="gam", formula=y~s(x)) +
      labs(x="Year", y = "Bird Island Total Debris Count")
    

    The problem with this approach is that you don't get the prediction points as well. This can be fixed with the following approach

    1. add the se directly to the result dataframe
    result$se = c(predict(mod_gam1,se=T)$se, rep(NA,29))
    
    1. use ggplot as before, but use geom_ribbon, setting the ymin and ymax directly
    ggplot(result, aes(x = Year, y = data, colour = source, fill=source))+
      geom_point()+ 
      geom_ribbon(aes(ymin=data-1.96*se, ymax=data+1.96*se), alpha=0.2) +
      labs(x="Year", y = "Bird Island Total Debris Count")+
      scale_y_continuous(limits = c(-200,1000))
    

    enter image description here