I would like to divide a dataset (vector of numeric values) up into some intervals, and produce a frequency histogram to see which values fall into each interval. If I use hist(dataset, breaks = 10)
this divides the dataset up into 10 equal intervals. I would like, instead, to divide the dataset into (e.g.) 10 bins in such a way each interval contains at least the 5% of the data points.
You could use the quantile()
function to define equal-size bins.
Here is an example on exponentially distributed data:
# Seed for the random number generation (for repeatability)
seed = 1313
# Sample size
N = 150
# Size of each bin (as proportion of N)
binsize = 0.05
# Sample data
x = rexp(N)
# Regular histogram (equal-width bins)
hist(x, breaks=20, freq=TRUE, main="Histogram on 20 equal-width bins", col="red")
# Quantiles of size `binsize`
x.quantiles = quantile(x, probs=seq(0, 1, binsize))
# Histogram on the equal-size breaks
hist(x, breaks=x.quantiles, freq=TRUE, main=paste("Approx. equal-size-bin 'Histogram' (bin-size=", binsize*100, "% of ", N, ")", sep=""), col="cyan")