I'm trying to replicate this figure with my own data using ggplot2.
In my case, I don't care about the colors and legend at the moment, I just need to show the number of new repositories during a one year period and the cdf (Cumulative distribution function) of the creation date (with a dual y axis, positioning the 1.0 value in the upper right corner, as in the example)
So far I got this code:
repo.count <- read.csv("http://pastebin.com/raw.php?i=PfUgyrt0",sep=",")
repo.count$createdAt <- as.Date(repo.count$createdAt)
ggplot(data = repo.count, aes(x = createdAt)) +
geom_histogram(colour = 1, fill = "white", position="identity") +
xlab('Date') +
ylab('# New Repositories') +
ggtitle('')+
scale_x_date(labels = scales::date_format("%Y-%b"),
breaks = by_month(date.groups$createdAt,2)) +
scale_y_continuous(sec.axis = sec_axis(~. / max(repo.count$total),
name = "Cumulative distribution")) +
theme(axis.text.x = element_text(angle=90)) + stat_ecdf(size=1)
But it's far from what I'd like to achieve:
I wonder if you are perhaps not looking for a histogram as such, but just a bar plot of new repos per day and a line indicating cumulative repos? If so, use geom_col
instead of geom_histogram
, and specify the y variable as repositories
ggplot(data = repo.count, aes(x = createdAt, y = repositories)) +
geom_col(fill = "green4", color = NA) +
geom_step(aes(y = total / 100)) +
xlab('Date') +
ylab('# New Repositories') +
ggtitle('')+
scale_x_date(labels = scales::date_format("%Y-%b"),
date_breaks = 'month') +
scale_y_continuous(sec.axis = sec_axis(~. * 100,
name = "Cumulative repos")) +
theme_classic(base_size = 16) +
theme(axis.text.x = element_text(angle = 90)