Search code examples
rggplot2ecdf

Get data associated to ggplot + stat_ecdf()


I like the stat_ecdf() feature part of ggplot2 package, which I find quite useful to explore a data series. However this is only visual, and I wonder if it is feasible - and if yes how - to get the associated table?

Please have a look to the following reproducible example

p <- ggplot(iris, aes_string(x = "Sepal.Length")) + stat_ecdf() # building of the cumulated chart 
p
attributes(p) # chart attributes
p$data # data is iris dataset, not the serie used for displaying the chart

enter image description here


Solution

  • We can recreate the data:

    #Recreate ecdf data
    dat_ecdf <- 
      data.frame(x=unique(iris$Sepal.Length),
                 y=ecdf(iris$Sepal.Length)(unique(iris$Sepal.Length))*length(iris$Sepal.Length))
    #rescale y to 0,1 range
    dat_ecdf$y <- 
      scale(dat_ecdf$y,center=min(dat_ecdf$y),scale=diff(range(dat_ecdf$y)))
    

    Below 2 plots should look the same:

    #plot using new data
    ggplot(dat_ecdf,aes(x,y)) +
      geom_step() +
      xlim(4,8)
    
    #plot with built-in stat_ecdf
    ggplot(iris, aes_string(x = "Sepal.Length")) +
      stat_ecdf() +
      xlim(4,8)