python pandas matplotlib stacked-chart plot-annotations

How to create and annotate a stacked proportional bar chart

I'm struggling to create a stacked bar chart derived from value_counts() of a columns from a dataframe.

Assume a dataframe like the following, where responder is not important, but would like to stack the count of [1,2,3,4,5] for all q# columns.

responder, q1, q2, q3, q4, q5
------------------------------
r1, 5, 3, 2, 4, 1
r2, 3, 5, 1, 4, 2
r3, 2, 1, 3, 4, 5
r4, 1, 4, 5, 3, 2
r5, 1, 2, 5, 3, 4
r6, 2, 3, 4, 5, 1
r7, 4, 3, 2, 1, 5

Look something like, except each bar would be labled by q# and it would include 5 sections for count of [1,2,3,4,5] from the data:

Ideally, all bars will be "100%" wide, showing the count as a proportion of the bar. But it's gauranteed that each responder row will have one entry for each, so the percentage is just a bonus if possible.

Any help would be much appreciated, with a slight preference for matplotlib solution.

Solution

You can calculate the heights of bars using percentages and obtain the stacked bar plot using ax = percents.T.plot(kind='barh', stacked=True) where percents is a DataFrame with q1,...q5 as columns and 1,...,5 as indices.

>>> percents
         q1        q2        q3        q4        q5
1  0.196873  0.199316  0.206644  0.194919  0.202247
2  0.205357  0.188988  0.205357  0.205357  0.194940
3  0.202265  0.217705  0.184766  0.196089  0.199177
4  0.199494  0.199494  0.190886  0.198481  0.211646
5  0.196137  0.195146  0.211491  0.205052  0.192174

Then you can use ax.patches to add labels for every bar. Labels can be generated from the original counts DataFrame: counts = df.apply(lambda x: x.value_counts())

>>> counts
    q1   q2   q3   q4   q5
1  403  408  423  399  414
2  414  381  414  414  393
3  393  423  359  381  387
4  394  394  377  392  418
5  396  394  427  414  388

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## create some data similar to yours
np.random.seed(42)
categories = ['q1','q2','q3','q4','q5']
df = pd.DataFrame(np.random.randint(1,6,size=(2000, 5)), columns=categories)

## counts will be used for the labels
counts = df.apply(lambda x: x.value_counts())

## percents will be used to determine the height of each bar
percents = counts.div(counts.sum(axis=1), axis=0)

counts_array = counts.values
nrows, ncols = counts_array.shape
indices = [(i,j) for i in range(0,nrows) for j in range(0,ncols)]
percents_array = percents.values

ax = percents.T.plot(kind='barh', stacked=True)
ax.legend(bbox_to_anchor=(1, 1.01), loc='upper right')
for i, p in enumerate(ax.patches):
    ax.annotate(f"({p.get_width():.2f}%)", (p.get_x() + p.get_width() - 0.15, p.get_y() - 0.10), xytext=(5, 10), textcoords='offset points')
    ax.annotate(str(counts_array[indices[i]]), (p.get_x() + p.get_width() - 0.15, p.get_y() + 0.10), xytext=(5, 10), textcoords='offset points')
plt.show()