Apologies for being relatively new to Python and especially new to using it for statistical purposes. I have two columns of data which I've read in from excel. I've created the 1D histograms for each column and proven the area under them to be equal to 1 like so:
n, bins, _=plt.hist(thickness, 15, range=[0,8], density=True)
Area_T= sum(numpy.diff(bins)*n)
Now I wish to prove that area of a 2D histogram is equal to 1. I have the 2D histogram made, just not sure how to integrate it since it returns a 2D array.
h, xedges, yedges, _=plt.hist2d(thickness_data, height_data, bins=(20,20), density=True)
You can calculate the total volume by multiplying each value in h
with the width and height of its corresponding bin:
import matplotlib.pyplot as plt
import numpy as np
h, xedges, yedges, _ = plt.hist2d(np.random.randn(1000).cumsum(), np.random.randn(1000).cumsum(),
bins=(20, 30), density=True)
total_volume = np.sum(h * np.diff(xedges).reshape(-1, 1) * np.diff(yedges).reshape(1, -1))
print("total_volume =", total_volume) # prints "total_volume = 1.0"
The volume of the histogram without density=True
is the size of one bin multiplied by the number of samples. The width of all bins is xedges[-1]-xedges[0]
. The height is yedges[-1]-yedges[0]
. The area of one bin is the area of all divided by the number of bins (20*30=600
in the example).
import matplotlib.pyplot as plt
import numpy as np
h, xedges, yedges, _ = plt.hist2d(np.random.randn(1000).cumsum(), np.random.randn(1000).cumsum(),
bins=(20, 30), density=False)
total_volume = np.sum(h * np.diff(xedges).reshape(-1, 1) * np.diff(yedges).reshape(1, -1))
print("total volume :", total_volume)
print(" predicted :", (xedges[-1] - xedges[0]) * (yedges[-1] - yedges[0]) / 600 * 1000)
This prints for example:
total volume : 4057.2494712526022
predicted : 4057.2494712526036
So, just a very small rounding error.