Search code examples
pythonnumpymatlabhistogram

Why does numpy.histogram (Python) leave off one element as compared to hist in Matlab?


I am trying to convert some Matlab code to Python, and the Matlab code looks like:

[N,X] = hist(Isb*1e6, -3:0.01:0)

where Isb is a 2048000 element 1D array. N is output as a 301 element 1D array.

My Python code looks like:

import numpy as np
N,X = np.histogram(Isb*1e6,np.array(-3,0.01,0.01))

but the N Python outputs is a 300 element 1D array where the last element from the Matlab N is left off.

Is there a way to replicate what Matlab does more accurately?

I need N and X to be the same size so that I can do this:

loc = X < -0.75
I   = N[loc].argmax()

Solution

  • Note that in matlab's hist(x, vec), vec difines the bin-centers, while in matlab histc(x, vec) vec defines the bin-edges of the histogram. Numpy's histogram seems to work with bin-edges. Is this difference important to you? It should be easy to convert from one to the other, and you might have to add an extra Inf to the end of the bin-edges to get it to return the extra bin you want. More or less like this (untested):

    import numpy as np
    
    def my_hist(x, bin_centers):
        bin_edges = np.r_[-np.Inf, 0.5 * (bin_centers[:-1] + bin_centers[1:]), 
            np.Inf]
        counts, edges =  np.histogram(x, bin_edges)
        return counts
    

    For sure it does not cover all the edge-cases that matlab's hist provides, but you get the idea.