I have a stacked histogram made using matplotlib. It has of course multiple bins (on per sector) and each bin/bar is further segmented in subsectors (stacked histogram).
I'm wondering how I could get the datapoints, do some math (let's say divide each bin by it's total value), and than set the new datapoints.
How I expect it to work:
import matplotlib.plt as plt
ax = plt.subplt(111)
h = ax.hist((subsector1,subsector2,subsector3), bins = 20, stacked=True)
y_data = h.get_yData
The shape of y_data would be something like 20 x 3 (bins x subsectors)
new_y_data = y_data normalized by total on each bin
The shape of new_y_data would also be like 20 x 3, but the sum on each bin would be 1 (or 100%)
new_h = h.set_yData(new_y_data)
new_h would look more like a bar plot, with equal sized bars, but different subsector distributions on each bar..
Is this even possible in python matplotlib?
When you only want the values, it's easier to use np.histogram
which does the same calculations without the need to draw.
When you have values, plt.bar
draws the directly without needing plt.hist
.
Pandas plot.bar
might be an alternative. Have a look at Creating percentage stacked bar chart using groupby for an example similar to yours.
Here is some example code using np.histogram
and plt.bar
:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
subsector1 = np.clip(np.random.normal(70, 20, 400), 0, 100)
subsector2 = np.clip(np.random.normal(50, 20, 1000), 0, 100)
subsector3 = np.clip(np.random.normal(25, 20, 500), 0, 100)
num_bins = 20
x_min = np.min(np.concatenate([subsector1, subsector2, subsector3]))
x_max = np.max(np.concatenate([subsector1, subsector2, subsector3]))
bounds = np.linspace(x_min, x_max, num_bins + 1)
values = np.zeros((num_bins, 3))
for i, subsect in enumerate((subsector1, subsector2, subsector3)):
values[:, i], _ = np.histogram(subsect, bins=bounds)
with np.errstate(divide='ignore', invalid='ignore'):
values /= values.sum(axis=1, keepdims=True)
fig, ax = plt.subplots()
bottom = 0
for i in range(3):
plt.bar((bounds[:-1] + bounds[1:]) / 2, values[:, i], bottom=bottom, width=np.diff(bounds) * 0.8)
bottom += values[:, i]
plt.xlim(x_min, x_max)
plt.gca().yaxis.set_major_formatter(PercentFormatter(1.0))
plt.show()