Search code examples
pythonnumpymatplotlibhistogram

How to calculate the total volume of a 2D Histogram?


Apologies for being relatively new to Python and especially new to using it for statistical purposes. I have two columns of data which I've read in from excel. I've created the 1D histograms for each column and proven the area under them to be equal to 1 like so:

n, bins, _=plt.hist(thickness, 15, range=[0,8], density=True)
Area_T= sum(numpy.diff(bins)*n)

Now I wish to prove that area of a 2D histogram is equal to 1. I have the 2D histogram made, just not sure how to integrate it since it returns a 2D array.

h, xedges, yedges, _=plt.hist2d(thickness_data, height_data, bins=(20,20), density=True)

Solution

  • You can calculate the total volume by multiplying each value in h with the width and height of its corresponding bin:

    import matplotlib.pyplot as plt
    import numpy as np
    
    h, xedges, yedges, _ = plt.hist2d(np.random.randn(1000).cumsum(), np.random.randn(1000).cumsum(), 
                                      bins=(20, 30), density=True)
    total_volume = np.sum(h * np.diff(xedges).reshape(-1, 1) * np.diff(yedges).reshape(1, -1))
    print("total_volume =", total_volume) # prints "total_volume = 1.0"
    

    The volume of the histogram without density=True is the size of one bin multiplied by the number of samples. The width of all bins is xedges[-1]-xedges[0]. The height is yedges[-1]-yedges[0]. The area of one bin is the area of all divided by the number of bins (20*30=600 in the example).

    import matplotlib.pyplot as plt
    import numpy as np
    
    h, xedges, yedges, _ = plt.hist2d(np.random.randn(1000).cumsum(), np.random.randn(1000).cumsum(),
                                      bins=(20, 30), density=False)
    total_volume = np.sum(h * np.diff(xedges).reshape(-1, 1) * np.diff(yedges).reshape(1, -1))
    print("total volume :", total_volume)
    print("   predicted :", (xedges[-1] - xedges[0]) * (yedges[-1] - yedges[0]) / 600 * 1000)
    

    This prints for example:

    total volume : 4057.2494712526022
       predicted : 4057.2494712526036
    

    So, just a very small rounding error.