I'm looking to create an equal frequency (aka equal bin count) histogram in R, ideally producing a plot in ggplot.
I can see an equal bin width plot is pretty straight forward in ggplot, as is selecting how many bins you have if you want them to be equal width. However, asking R to create bin widths based on a near equal number of samples in each seems more complicated...
Many thanks in advance!
So far, I have used the classInt package and ClassIntervals function to define the bands (or bin widths - max and min values) that would give equal numbers of data points in each bin (see below), but can't see how to integrate this into the code to define the bin widths of the histogram.
bands= classIntervals(dataset$Depth, 10, style = 'quantile')
bands
style: quantile
[0.1545109,0.1616876) [0.1616876,0.1682627) [0.1682627,0.1713514)
2233 2232 2232
[0.1713514,0.1736983) [0.1736983,0.1758581) [0.1758581,0.1792968)
2233 2232 2232
[0.1792968,0.1869507) [0.1869507,0.1913873) [0.1913873,0.2064948)
2233 2232 2232
[0.2064948,0.5918484]
2233
Is there a way of intergrating these bandwidths set out into the basic ggplot code for a histogram, perhaps by defining 'binwidth'?
ggplot(dataset, aes(x=Depth))+geom_histogram()
Or alternatively, is there another way to create equal frequency histograms in R someone might be able to suggest?
You can create quantile based break points with quantile
:
x <- faithful$waiting
ncells <- 10
breaks <- quantile(x, seq(0,1,by=1/ncells))
Depending on the ratio of the number of data points and number of cells, and whether there are ties, the cells might not be exactly equitable as in the above example of the faithful
data, for which they are only approximately equitable:
table(cut(x, breaks=breaks))
(43,51] (51,55] (55,60] (60,71] (71,76] (76,78] (78,81] (81,83] (83,86] (86,96]
31 27 24 29 31 27 31 26 22 23
These break points can then be used for the parameter breaks
in hist()
:
hist(x, breaks=breaks, prob=T)