Search code examples
c++matlabmatlab-coder

What is MATLAB's algorithm to calculate histogram with hist function?


I'm working on translation of some old MATLAB code to C++. I have noticed, that my custom function to calculate histogram that supposed to be equivalent to MATLAB [counts,centers]= hist(___) gives different results. I could not find a bug in my implementation, so I used MATLAB Coder to generate C++ function from MATLAB code and compare it to my C++ code. Here is a simple MATLAB function I used to generate C++ code:

function [counts, centers] = my_hist(values, bins)
    [counts, centers] = hist(values, bins);
    disp(centers);
    disp(counts);
end

And a script to call it, so MATLAB can define inputs:

values = rand(1,1000);
bins = linspace(0.05, 0.95, 10);

[counts, centers] = my_hist(values, bins);

Based on the above, the Coder generates the function:

//
// File: my_hist.cpp
//
// MATLAB Coder version            : 5.3
// C/C++ source code generated on  : 17-Nov-2022 15:46:17
//

// Include Files
#include "my_hist.h"
#include "rt_nonfinite.h"
#include <algorithm>
#include <cmath>
#include <cstring>
#include <math.h>

// Function Definitions
//
// MY_HIST Summary of this function goes here
//    Detailed explanation goes here
//
// Arguments    : const double values[1000]
//                const double bins[10]
//                double counts[10]
//                double centers[10]
// Return Type  : void
//
void my_hist(const double values[1000], const double bins[10],
             double counts[10], double centers[10])
{
  double edges[11];
  double nn[11];
  double absx;
  int k;
  int low_i;
  std::copy(&bins[0], &bins[10], &centers[0]);
  for (k = 0; k < 9; k++) {
    absx = bins[k];
    edges[k + 1] = absx + (bins[k + 1] - absx) / 2.0;
  }
  edges[0] = rtMinusInf;
  edges[10] = rtInf;
  for (k = 0; k < 9; k++) {
    double absx_tmp;
    absx_tmp = edges[k + 1];
    absx = std::abs(absx_tmp);
    if ((!std::isinf(absx)) && (!std::isnan(absx))) {
      if (absx <= 2.2250738585072014E-308) {
        absx = 4.94065645841247E-324;
      } else {
        frexp(absx, &low_i);
        absx = std::ldexp(1.0, low_i - 53);
      }
    } else {
      absx = rtNaN;
    }
    edges[k + 1] = absx_tmp + absx;
  }
  std::memset(&nn[0], 0, 11U * sizeof(double));
  low_i = 1;
  int exitg1;
  do {
    exitg1 = 0;
    if (low_i + 1 < 12) {
      if (!(edges[low_i] >= edges[low_i - 1])) {
        for (low_i = 0; low_i < 11; low_i++) {
          nn[low_i] = rtNaN;
        }
        exitg1 = 1;
      } else {
        low_i++;
      }
    } else {
      for (k = 0; k < 1000; k++) {
        low_i = 0;
        absx = values[k];
        if (!std::isnan(absx)) {
          if ((absx >= edges[0]) && (absx < edges[10])) {
            int high_i;
            int low_ip1;
            low_i = 1;
            low_ip1 = 2;
            high_i = 11;
            while (high_i > low_ip1) {
              int mid_i;
              mid_i = (low_i + high_i) >> 1;
              if (values[k] >= edges[mid_i - 1]) {
                low_i = mid_i;
                low_ip1 = mid_i + 1;
              } else {
                high_i = mid_i;
              }
            }
          }
          if (values[k] == edges[10]) {
            low_i = 11;
          }
        }
        if (low_i > 0) {
          nn[low_i - 1]++;
        }
      }
      exitg1 = 1;
    }
  } while (exitg1 == 0);
  std::copy(&nn[0], &nn[10], &counts[0]);
  counts[9] += nn[10];
}

//
// File trailer for my_hist.cpp
//
// [EOF]
//

I don't understande what happens in this chunk of code and why it is done:

  for (k = 0; k < 9; k++) {
    double absx_tmp;
    absx_tmp = edges[k + 1];
    absx = std::abs(absx_tmp);
    if ((!std::isinf(absx)) && (!std::isnan(absx))) {
      if (absx <= 2.2250738585072014E-308) {
        absx = 4.94065645841247E-324;
      } else {
        frexp(absx, &low_i);
        absx = std::ldexp(1.0, low_i - 53);
      }
    } else {
      absx = rtNaN;
    }
    edges[k + 1] = absx_tmp + absx;
  }

The function shift the edges of bins, but how and why? I will be grateful for help and explanation!


Solution

  • That bit of code adds eps to each bin edge except the first and last.

    It is hard to know why hist does this, they must be working around some edge case they discovered (presumably related to floating-point rounding errors), and figured this was the best or the easiest solution.