Search code examples
rggplot2histogramcdf

How to replicate a picture with histogram and CDF function in one picture?


I'm trying to replicate this figure with my own data using ggplot2. enter image description here

In my case, I don't care about the colors and legend at the moment, I just need to show the number of new repositories during a one year period and the cdf (Cumulative distribution function) of the creation date (with a dual y axis, positioning the 1.0 value in the upper right corner, as in the example)

So far I got this code:

repo.count <- read.csv("http://pastebin.com/raw.php?i=PfUgyrt0",sep=",")
repo.count$createdAt <- as.Date(repo.count$createdAt)

ggplot(data = repo.count, aes(x = createdAt)) +
  geom_histogram(colour = 1, fill = "white", position="identity") +
  xlab('Date') +
  ylab('# New Repositories') +
  ggtitle('')+
  scale_x_date(labels = scales::date_format("%Y-%b"),
             breaks = by_month(date.groups$createdAt,2)) + 
  scale_y_continuous(sec.axis = sec_axis(~. / max(repo.count$total), 
                                         name = "Cumulative distribution")) +
  theme(axis.text.x = element_text(angle=90)) + stat_ecdf(size=1)

But it's far from what I'd like to achieve:

enter image description here.


Solution

  • I wonder if you are perhaps not looking for a histogram as such, but just a bar plot of new repos per day and a line indicating cumulative repos? If so, use geom_col instead of geom_histogram, and specify the y variable as repositories

    ggplot(data = repo.count, aes(x = createdAt, y = repositories)) +
      geom_col(fill = "green4", color = NA) +
      geom_step(aes(y = total / 100)) +
      xlab('Date') +
      ylab('# New Repositories') +
      ggtitle('')+
      scale_x_date(labels = scales::date_format("%Y-%b"),
                   date_breaks = 'month') + 
      scale_y_continuous(sec.axis = sec_axis(~. * 100, 
                                             name = "Cumulative repos")) +
      theme_classic(base_size = 16) +
      theme(axis.text.x = element_text(angle = 90)
    

    enter image description here