python matplotlib histogram matlab-figure

Is it possible to manipulate the data in a matplotlib histogram using Get and Set?

I have a stacked histogram made using matplotlib. It has of course multiple bins (on per sector) and each bin/bar is further segmented in subsectors (stacked histogram).

I'm wondering how I could get the datapoints, do some math (let's say divide each bin by it's total value), and than set the new datapoints.

How I expect it to work:

import matplotlib.plt as plt
ax = plt.subplt(111)
h = ax.hist((subsector1,subsector2,subsector3), bins = 20, stacked=True)

y_data = h.get_yData

The shape of y_data would be something like 20 x 3 (bins x subsectors)

new_y_data = y_data normalized by total on each bin

The shape of new_y_data would also be like 20 x 3, but the sum on each bin would be 1 (or 100%)

new_h = h.set_yData(new_y_data)

new_h would look more like a bar plot, with equal sized bars, but different subsector distributions on each bar..

Is this even possible in python matplotlib?

Solution

When you only want the values, it's easier to use np.histogram which does the same calculations without the need to draw.

When you have values, plt.bar draws the directly without needing plt.hist.

Pandas plot.bar might be an alternative. Have a look at Creating percentage stacked bar chart using groupby for an example similar to yours.

Here is some example code using np.histogram and plt.bar:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter

subsector1 = np.clip(np.random.normal(70, 20, 400), 0, 100)
subsector2 = np.clip(np.random.normal(50, 20, 1000), 0, 100)
subsector3 = np.clip(np.random.normal(25, 20, 500), 0, 100)
num_bins = 20
x_min = np.min(np.concatenate([subsector1, subsector2, subsector3]))
x_max = np.max(np.concatenate([subsector1, subsector2, subsector3]))
bounds = np.linspace(x_min, x_max, num_bins + 1)
values = np.zeros((num_bins, 3))
for i, subsect in enumerate((subsector1, subsector2, subsector3)):
    values[:, i], _ = np.histogram(subsect, bins=bounds)
with np.errstate(divide='ignore', invalid='ignore'):
    values /= values.sum(axis=1, keepdims=True)
fig, ax = plt.subplots()
bottom = 0
for i in range(3):
    plt.bar((bounds[:-1] + bounds[1:]) / 2, values[:, i], bottom=bottom, width=np.diff(bounds) * 0.8)
    bottom += values[:, i]
plt.xlim(x_min, x_max)
plt.gca().yaxis.set_major_formatter(PercentFormatter(1.0))
plt.show()