I'm looking for a technique to logarithmically bin some data sets. We've got data with values ranging from _min to _max (floats >= 0) and the user needs to be able to specify a varying number of bins _num_bins (some int n).
I've implemented a solution taken from this question and some help on scaling here but my solution stops working when my data values lie below 1.0.
class Histogram {
double _min, _max;
int _num_bins;
......
};
double Histogram::logarithmicValueOfBin(double in) const {
if (in == 0.0)
return _min;
double b = std::log(_max / _min) / (_max - _min);
double a = _max / std::exp(b * _max);
double in_unscaled = in * (_max - _min) / _num_bins + _min;
return a * std::exp(b * in_unscaled) ;
}
When the data values are all greater than 1 I get nicely sized bins and can plot properly. When the values are less than 1 the bins come out more or less the same size and we get way too many of them.
I found a solution by reimplementing an opensource version of Matlab's logspace function.
Given a range and a number of bins you need to create an evenly spaced numerical sequence
module.exports = function linspace(a,b,n) {
var every = (b-a)/(n-1),
ranged = integers(a,b,every);
return ranged.length == n ? ranged : ranged.concat(b);
}
After that you need to loop through each value and with your base (e, 2 or 10 most likely) store the power and you get your bin ranges.
module.exports.logspace = function logspace(a,b,n) {
return linspace(a,b,n).map(function(x) { return Math.pow(10,x); });
}
I rewrote this in C++ and it's able to support ranges > 0.