Search code examples
rplotggplot2line-plot

Custom ggplot2 shaded error areas on categorical line plot


I'm trying to plot a line, smoothed by loess, but I'm trying to figure out how to include shaded error areas defined by existing variables, but also smoothed.

This code creates example data:

set.seed(12345)
data <- cbind(rep("A", 100), rnorm(100, 0, 1))
data <- rbind(data, cbind(rep("B", 100), rnorm(100, 5, 1)))
data <- rbind(data, cbind(rep("C", 100), rnorm(100, 10, 1)))
data <- rbind(data, cbind(rep("D", 100), rnorm(100, 15, 1)))
data <- cbind(rep(1:100, 4), data)
data <- data.frame(data)
names(data) <- c("num", "category", "value")
data$num <- as.numeric(data$num)
data$value <- as.numeric(data$value)
data$upper <- data$value+0.20
data$lower <- data$value-0.30

Plotting the data below, this is what I get:

ggplot(data, aes(x=num, y=value, colour=category)) +
  stat_smooth(method="loess", se=F)

enter image description here

What I'd like is a plot that looks like the following, except with the upper and lower bounds of the shaded areas being bounded by smoothed lines of the "upper" and "lower" variables in the generated data.

enter image description here

Any help would be greatly appreciated.


Solution

  • Here's one way to add smoothed versions of upper and lower. We'll add LOESS predictions for upper and lower to the data frame and then plot those using geom_ribbon. It would be more elegant if this could all be done within the call to ggplot. That's probably possible by feeding a special-purpose function to stat_summary, and hopefully someone else will post an answer using that approach.

    # Expand the scale of the upper and lower values so that the difference
    # is visible in the plot
    data$upper = data$value + 10
    data$lower = data$value - 10
    
    # Order data by category and num
    data = data[order(data$category, data$num),]
    
    # Create LOESS predictions for the values of upper and lower 
    # and add them to the data frame. I'm sure there's a better way to do this,
    # but my attempts with dplyr and tapply both failed, so I've resorted to the clunky 
    # method below.
    data$upperLoess = unlist(lapply(LETTERS[1:4], 
                      function(x) predict(loess(data$upper[data$category==x] ~ 
                                                      data$num[data$category==x]))))
    data$lowerLoess = unlist(lapply(LETTERS[1:4], 
                      function(x) predict(loess(data$lower[data$category==x] ~ 
                                                      data$num[data$category==x]))))
    
    # Use geom_ribbon to add a prediction band bounded by the LOESS predictions for 
    # upper and lower
    ggplot(data, aes(num, value, colour=category, fill=category)) +
      geom_smooth(method="loess", se=FALSE) +
      geom_ribbon(aes(x=num, y=value, ymax=upperLoess, ymin=lowerLoess), 
                  alpha=0.2)
    

    And here's the result:

    enter image description here