Search code examples
rhistogram

Non-uniform bins in Histogram in R


I would like to divide a dataset (vector of numeric values) up into some intervals, and produce a frequency histogram to see which values fall into each interval. If I use hist(dataset, breaks = 10) this divides the dataset up into 10 equal intervals. I would like, instead, to divide the dataset into (e.g.) 10 bins in such a way each interval contains at least the 5% of the data points.


Solution

  • You could use the quantile() function to define equal-size bins.

    Here is an example on exponentially distributed data:

    # Seed for the random number generation (for repeatability)
    seed = 1313
    
    # Sample size
    N = 150
    
    # Size of each bin (as proportion of N)
    binsize = 0.05
    
    # Sample data
    x = rexp(N)
    
    # Regular histogram (equal-width bins)
    hist(x, breaks=20, freq=TRUE, main="Histogram on 20 equal-width bins", col="red")
    

    Histogram

    # Quantiles of size `binsize`
    x.quantiles = quantile(x, probs=seq(0, 1, binsize))
    
    # Histogram on the equal-size breaks
    hist(x, breaks=x.quantiles, freq=TRUE, main=paste("Approx. equal-size-bin 'Histogram' (bin-size=", binsize*100, "% of ", N, ")", sep=""), col="cyan")
    

    Equal-size-bin-histogram