Search code examples
rggplot2smoothing

R ggplot2 geom_smooth wihtout adding negative values


Assume that we ran install.packages("ggplot2") install.packages("babynames")

Then

library(babynames)
data(babynames)
my_d <- babynames %>%
  filter(
    name == "Josiah"   & sex == "M" |
      name == "Alicia"  & sex == "F" |
      name == "Gabriel"  & sex == "M" |
      name == "Joshua" & sex == "M"
  ) %>%
  group_by(name, year, sex) %>%
  summarise(n = sum(n) ) %>%
  arrange(year)

ggplot(my_d, aes(x = year, y = n, color = name) ) +
  geom_line(se = FALSE) +
  scale_x_continuous(breaks = seq(1900, 2020, by = 10) )

gives

enter image description here

It is nice but I would like to smooth it "a little", so I do

ggplot(my_d, aes(x = year, y = n, color = name) ) +
  geom_smooth(se = FALSE) +
  scale_x_continuous(breaks = seq(1900, 2020, by = 10) )

which gives enter image description here

This is smooth, but it added negative values for "Joshua".

How could I avoid that "side effect"?

EDIT: Changing

geom_smooth(se = FALSE)

to

geom_smooth(se = FALSE, method = "loess") +
  ylim(0, 30000)

removes negative values but the "smoothing is still to coarse, so too speak. It shows "Gabriel" with an increasing trend, which it is not. Here is the result

enter image description here


Solution

  • The span = argument should makes the smoothing "wigglier" according to the documentation. Maybe playing around with that will solve your problem. Below is an example with span = .1. The rough edges are gone from the lines, but general trends tend to remain visible and true. Setting span too low may cause memory issues depending on the size of the data.

    library(babynames)
    library(tidyverse)
    data(babynames)
    my_d <- babynames %>%
      filter(
        name == "Josiah"   & sex == "M" |
          name == "Alicia"  & sex == "F" |
          name == "Gabriel"  & sex == "M" |
          name == "Joshua" & sex == "M"
      ) %>%
      group_by(name, year, sex) %>%
      summarise( n = sum(n) ) %>%
      arrange( year )
    
    ggplot( my_d, aes(x = year, y = n, color = name) ) +
      geom_smooth(se = FALSE, method = "loess", span = .1) +
      scale_x_continuous( breaks = seq(1900, 2020, by = 10) )
    

    Created on 2020-02-21 by the reprex package (v0.3.0)