Search code examples
pythonnumpymatplotlibhistogramnormalize

Normalize a multiple data histogram


I have several arrays that I'm plotting a histogram of, like so:

import numpy as np
import matplotlib.pyplot as plt

x = np.random.normal(0,.5,1000)
y = np.random.normal(0,.5,100000)

plt.hist((x,y),normed=True)

Of course, however, this normalizes both of the arrays individually, so that they both have the same peak. I'm looking to normalize them to the total number of elements, so that the histogram of y will be visibly taller than that of x. Is there a handy way to do this in matplotlib or will I have to mess around in numpy? I haven't found anything about it.

Another way to put it is that if I were instead to make a cumulative plot of the two arrays, they shouldn't both top out at 1, but should add to 1.


Solution

  • Yes, you can compute the histogram with numpy and renormalise it.

    x = np.random.normal(0,.5,1000)
    y = np.random.normal(0,.5,100000)
    
    xhist, xbins = np.histogram(x, normed=True)
    yhist, ybins = np.histogram(x, normed=True)
    

    And now, you apply your regularisation. For example, if you want x to be normalised to 1 and y proportional:

    yhist *= len(y) / len(x)
    

    Now, to plot the histogram:

    def plot_histogram(data, edge_bins, **kwargs):
        bins = edge_bins[:-1] + edge_bins[1:]
        plt.step(bins, data, **kwargs)
    
    plot_histogram(xhist, xbins, c='b')
    plot_histogram(yhist, ybins, c='g')
    

    enter image description here