I am trying to calculate the area under the histogram with seaborn using this function (data are normalised)
sum (np.diff(bins_sns)*values_sns)
to get the width of the bins and values of the height I am using
values_sns=[h.get_height()for h in sns.distplot(data).patches]
bins_sns=[h.get_width ()for h in sns.distplot(data).patches]
but their length is different i.e. 71,48. This is the error I get: ValueError: operands could not be broadcast together with shapes (71,) (48,)
Any help would be appreciated.
Several things are going wrong here:
sns.distplot(data)
creates a histogram (together with a kdeplot); in the code above it is called twice, so creating two histograms on the same spotsns.distplot(data)
returns the ax
on which it plotted; an ax
contains all the graphical elements of one subplotax.patches
returns a list of all patches that were drawn (patches can be rectangles, circles, closed curves, ...)sns.distplot(data)
isn't the first drawing operation on the given ax
, ax.patches
can contain elements drawn before; especially calling sns.distplot(data)
twice would double the number of patches (in this case it seems 48 bars were drawn)np.diff(bins_sns)
would contain all the differences between the subsequent bin widths. Note that there is one less difference than there are values (72 values give 71 differences). As usually all bin widths are equal, np.diff(bins_sns)
would be all zeros.import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
data = np.random.normal(0, 1, 100)
ax = sns.distplot(data)
values_sns = [h.get_height() for h in ax.patches]
bins_sns = [h.get_width() for h in ax.patches]
total_area = sum([height * width for height, width in zip(values_sns, bins_sns)])
# total_area = np.sum(np.array(bins_sns) * np.array(values_sns)) # shorter, faster using numpy
print("total_area:", total_area) # 1.0