Search code examples
rplotggplot2

How to combine stat_ecdf with geom_ribbon?


I am trying to draw an ECDF of some data with a "confidence interval" represented via a shaded region using ggplot2. I am having trouble combining geom_ribbon() with stat_ecdf() to achieve the effect I am after.

Consider the following example data:

set.seed(1)
dat <- data.frame(variable = rlnorm(100) + 2)
dat <- transform(dat, lower = variable - 2, upper = variable + 2)

> head(dat)
  variable     lower    upper
1 2.534484 0.5344838 4.534484
2 3.201587 1.2015872 5.201587
3 2.433602 0.4336018 4.433602
4 6.929713 4.9297132 8.929713
5 3.390284 1.3902836 5.390284
6 2.440225 0.4402254 4.440225

I am able to produce an ECDF of variable using

library("ggplot2")
ggplot(dat, aes(x = variable)) +
    geom_step(stat = "ecdf")

However I am unable to use lower and upper as the ymin and ymax aesthetics of geom_ribbon() to superimpose the confidence interval on the plot as another layer. I have tried:

ggplot(dat, aes(x = variable)) +
    geom_ribbon(aes(ymin = lower, ymax = upper), stat = "ecdf") +
    geom_step(stat = "ecdf")

but this raises the following error

Error: geom_ribbon requires the following missing aesthetics: ymin, ymax

Is there a way to coax geom_ribbon() into working with stat_ecdf() to produce a shaded confidence interval? Or, can anyone suggest an alternative means of adding a shaded polygon defined by lower and upper as a layer to the ECDF plot?


Solution

  • Try this (a bit of shot in the dark):

    ggplot(dat, aes(x = variable)) +
      geom_ribbon(aes(x = variable,ymin = ..y..-2,ymax = ..y..+2), stat = "ecdf",alpha=0.2) +
      geom_step(stat = "ecdf")
    

    Ok, so that's not the same thing as what you trying to do, but it should explain what's going on. The stat is returning a data frame with just the original x and the computed y, so I think that's all you have to work with. i.e. stat_ecdf only computes the cumulative distribution function for a single x at a time.

    The only other thing I can think of is the obvious, calculating the lower and upper separately, something like this:

    l <- ecdf(dat$lower)
    u <- ecdf(dat$upper)
    v <- ecdf(dat$variable)
    dat$lower1 <- l(dat$variable)
    dat$upper1 <- u(dat$variable)
    dat$variable1 <- v(dat$variable)
    
    ggplot(dat,aes(x = variable)) + 
      geom_step(aes(y = variable1)) + 
      geom_ribbon(aes(ymin = upper1,ymax = lower1),alpha = 0.2)