Search code examples
rggplot2plotgraphsmoothing

Adding upper/lower cap to geom_smooth smoothed line


I want to have a cap to the smoothed line produced with geom_line (ggplot2) using the loess method. My data do not go above 1, but the smoothed line does.

The only post I found in this regard is from 2012. However, the problem did not receive a solution (see the following link: https://groups.google.com/g/ggplot2/c/Mxsbb4p3V7Y).

For convenience I reprise the working example originally posted by the person posing the question here. I hope this will not create too much trouble. As noted by the same person, the smoothed line goes below 0, although min(y) = 0.007593811.

library(ggplot2)
y<-rep(0:1,each=20,times=5)+runif(10,0,0.05)
x<-seq(1:length(y))
ggplot()+geom_line(aes(x=x,y=y))+geom_smooth(aes(x=x,y=y),method='loess',span=0.20,se=F) 

Is it possible to add a upper/lower cap for geom_smooth, such that the values of the smoothed line produced with the loess method lie within a specific range (e.g., 0 and 1)? Thank you all.

EDIT

Thank you both for the great solutions!


Solution

  • Your example includes values of greater than 1, so let's make an equivalent data set that doesn't go above 1:

    library(ggplot2)
    
    df <- data.frame(x = 1:200, 
                     y = ifelse(rep(0:1 == 0, each = 20, times = 5), 
                                runif(200)/20, runif(200, 0.95, 1)))
    
    ggplot(df, aes(x, y)) +
      geom_line() +
      geom_smooth(method = 'loess', span = 0.20, se = FALSE) +
      geom_hline(yintercept = c(0, 1), linetype = 2)
    

    enter image description here

    Although we could simply clamp the loess to the range [0, 1], a slightly more sophisticated approach would be to perform a logit transformation, regress on that, then reverse the transform. This gives a smoother result:

    df$logit_y <- log(df$y/(1 - df$y)) # logit transform
    df$pred <- predict(loess(logit_y ~ x, data = df, span = 0.2))
    df$pred <- exp(df$pred)/(exp(df$pred) + 1) # Reverse logit
    
    ggplot(df, aes(x, y)) +
      geom_line() +
      geom_line(aes(y = pred), color = "blue", linewidth = 1) +
      geom_hline(yintercept = c(0, 1), linetype = 2)
    

    enter image description here