Search code examples
rggplot2smoothing

ggplot smoothing in R - interpretation


Count of words over time

I created the following plot in R with this code:

          ggplot(sentiment, aes(x = year, y = nrc_sent$sentiment)) + 
        geom_smooth(method = "auto") +  # pick a method & fit a model
        scale_x_continuous(breaks = round(seq(min(sentiment$year), max(sentiment$year), by = 2),1))+
        labs(x="", y="")

geom_smooth() using method = 'loess'(Got this message when running the code)

Where nrc_sent represents

> nrc_sent
# A tibble: 519 x 3
sentiment state year
<dbl> <chr> <dbl>
1 152. Alabama 2007.
2 107. Alabama 2008.
3 80. Alabama 2009.
4 75. Alabama 2010.
5 173. Alabama 2011.
6 180. Alabama 2012.
7 187. Alabama 2013.
8 167. Alabama 2014.
9 124. Alabama 2015.
10 215. Alabama 2016.
# ... with 509 more rows

I am puzzled as to what the shaded area around the line represents. I looked into ggplot help page, but there does not seem to be any information that I can use in my academic article to explain what the graph represents, and what the shaded area is. I would appreciate any help with this


Solution

  • If you look at the documentation for geom_smooth: ?geom_smooth , it states that the parameter se is used to control if there is a confidence interval around the fitted line. If it is TRUE then you are instructed to look at level level is the level of confidence interval to use with a default of 0.95.

    My guess is this will also work for you. True playing with the level.

    ggplot(sentiment, aes(x = year, y = nrc_sent$sentiment)) + 
            geom_smooth(method = "loess", se=TRUE,level=0.95) +  # pick a method & fit a model
            scale_x_continuous(breaks = round(seq(min(sentiment$year), max(sentiment$year), by = 2),1))+
            labs(x="", y="")