I'm trying to understand the behaviour of binwith
in geom_histogram()
when running over POSIXct / datetime values. In the documentation, it says that binwith
specifies the width of the bins which can be specified as a numeric value and that the bin width of a date variable is the number of days in each time. So I would expect the following two ggplot
commands to produce the same output.
Not only is this not the case, but the second command takes about 5 minutes to run
library(ggplot2)
df <- data.frame(day = as.POSIXct("2018-11-01 10:00:00")+(1:10)*3600*24)
ggplot(df,aes(day)) +
geom_histogram(bins = 10,colour = "black",fill = "grey")
ggplot(df,aes(day)) +
geom_histogram(binwidth = 1,colour = "black",fill = "grey")
Created on 2018-11-04 by the reprex package (v0.2.0).
I've had the rubber duck experience and found that the with date the documentation meant specifically an vector of the class Date
. The behaviour of binwidth
with the class POSIXct
is described in the followup sentence: the bin width of a time variable is the number of seconds.
In short, the solution is multiplying binwidth
by 3600*24
to get days instead of seconds.
ggplot(df,aes(day)) +
geom_histogram(binwidth = 1*3600*24,colour = "black",fill = "grey")