Search code examples
javastatisticshistogrambinsapache-commons-math

how to generate bins for histogram using apache math 3.0 in java?


I have been looking for away to generate bins for specific dataset (by specifying lower band, upper band and number of bins required) using apache common math 3.0. I have looked at Frequency http://commons.apache.org/math/apidocs/org/apache/commons/math3/stat/Frequency.html but it does not give me what i want.. i want a method that give me frequency for values in an interval ( ex: how many values are between 0 to 5). Any suggestions or ideas?


Solution

  • As far as I know there is no good histogram class in Apache Commons. I ended up writing my own. If all you want are linearly distributed bins from min to max, then it is quite easy to write.

    Maybe something like this:

    public static int[] calcHistogram(double[] data, double min, double max, int numBins) {
      final int[] result = new int[numBins];
      final double binSize = (max - min)/numBins;
    
      for (double d : data) {
        int bin = (int) ((d - min) / binSize);
        if (bin < 0) { /* this data is smaller than min */ }
        else if (bin >= numBins) { /* this data point is bigger than max */ }
        else {
          result[bin] += 1;
        }
      }
      return result;
    }
    

    Edit: Here's an example.

    double[] data = { 2, 4, 6, 7, 8, 9 };
    int[] histogram = calcHistogram(data, 0, 10, 4);
    // This is a histogram with 4 bins, 0-2.5, 2.5-5, 5-7.5, 7.5-10.
    assert histogram[0] == 1; // one point (2) in range 0-2.5
    assert histogram[1] == 1; // one point (4) in range 2.5-5.
    // etc..