Non-uniform bins in Histogram in R

I would like to divide a dataset (vector of numeric values) up into some intervals, and produce a frequency histogram to see which values fall into each interval. If I use hist(dataset, breaks = 10) this divides the dataset up into 10 equal intervals. I would like, instead, to divide the dataset into (e.g.) 10 bins in such a way each interval contains at least the 5% of the data points.

Solution

You could use the quantile() function to define equal-size bins.

Here is an example on exponentially distributed data:

# Seed for the random number generation (for repeatability)
seed = 1313

# Sample size
N = 150

# Size of each bin (as proportion of N)
binsize = 0.05

# Sample data
x = rexp(N)

# Regular histogram (equal-width bins)
hist(x, breaks=20, freq=TRUE, main="Histogram on 20 equal-width bins", col="red")

# Quantiles of size `binsize`
x.quantiles = quantile(x, probs=seq(0, 1, binsize))

# Histogram on the equal-size breaks
hist(x, breaks=x.quantiles, freq=TRUE, main=paste("Approx. equal-size-bin 'Histogram' (bin-size=", binsize*100, "% of ", N, ")", sep=""), col="cyan")