Search code examples
matlabhistogrambinning

how to efficiently bin data in matlab


I want to bin some data according to some 'steps', here 1:10. So bin{1} should contain values >=steps(1) & <steps(2) etc.

I'm wondering if I can get some tips/feedback from the community, put into a question: is there some common practice for binning data that I haven't found yet, can the code be improved in terms of efficiency and readability?

data=abs(sin(0:.1:10)*10); %example data
steps=1:10; %user-defined bins
betw=@(x,mi,ma) x(x>=mi & x<ma); %function that returns values between minimum/maximum

bin={};
for ind=1:numel(steps)-1
  bin{ind}=betw(data,steps(ind),steps(ind+1));
end
bin

bin =

  1×9 cell array

  Columns 1 through 7

    {1×7 double}    {1×7 double}    {1×7 double}    {1×8 double}    {1×9 double}    {1×7 double}    {1×10 double}

  Columns 8 through 9

    {1×11 double}    {1×27 double}

Solution

  • The histcounts function would be the "standard" way to do this:

    data = abs(sin(0:.1:10)*10); %example data
    steps = 1:10;                %user-defined bins
    
    hc = histcounts( data, steps );
    >> hc = 
      [ 7 7 7 8 9 7 10 11 27 ]
    

    Note that hc is one element smaller than steps because steps defines the bin edges. The total counts sum(hc) is equal to the number of elements in data which fell between the lowest and highest bins - in this case fewer than numel(data) because some elements of data are lower-valued than your lowest bin in steps.

    There are many options within histcounts to return the bin edges, specify number of bins rather than edges, return the bin number for each element, etc...

    If all you actually want is the bar plot (noted in your comment), you can use histogram, which calls histcounts under-the-hood for the computation, but outputs a figure too.

    histogram( data, steps );
    

    histogram