Search code examples
c++histogramlogarithmbinning

How to do logarithmic binning on a histogram?


I'm looking for a technique to logarithmically bin some data sets. We've got data with values ranging from _min to _max (floats >= 0) and the user needs to be able to specify a varying number of bins _num_bins (some int n).

I've implemented a solution taken from this question and some help on scaling here but my solution stops working when my data values lie below 1.0.

class Histogram {
double _min, _max;
int _num_bins;
......
};

double Histogram::logarithmicValueOfBin(double in) const {
    if (in == 0.0)
        return _min;

    double b = std::log(_max / _min) / (_max - _min);
    double a = _max / std::exp(b * _max);

    double in_unscaled = in * (_max - _min) / _num_bins + _min;
    return a * std::exp(b * in_unscaled) ;
}

When the data values are all greater than 1 I get nicely sized bins and can plot properly. When the values are less than 1 the bins come out more or less the same size and we get way too many of them.


Solution

  • I found a solution by reimplementing an opensource version of Matlab's logspace function.

    Given a range and a number of bins you need to create an evenly spaced numerical sequence

    module.exports = function linspace(a,b,n) {
      var every = (b-a)/(n-1),
          ranged = integers(a,b,every);
    
      return ranged.length == n ? ranged : ranged.concat(b);
    }
    

    After that you need to loop through each value and with your base (e, 2 or 10 most likely) store the power and you get your bin ranges.

    module.exports.logspace = function logspace(a,b,n) {
      return linspace(a,b,n).map(function(x) { return Math.pow(10,x); });
    }
    

    I rewrote this in C++ and it's able to support ranges > 0.