pandas plotly data-visualization histogram

Stacked histogram with percentage annotations

I have a pandas dataframe that looks like this

I want to create a Histogram(stacked/grouped) in plotly. And I want to display the percentage of total values per box falling into a certain bin. For example, if box A contains 5 values in the bin 0-10 and a total of 20 values in A, then the histogram for box A should display 25%, and so on.

Note: I don't want the percentage based on total values falling into a certain bin for all boxes, rather I want to display percentage per box in a particular bin.

What changes do I need to make here?

px.histogram(sample_data,
                   color="Box",
                   nbins=10,
                   barmode="group",
                   range_x=[0, 100],
                  )

Solution

imports

import pandas as pd
import plotly.express as px

data

###dummy data
box = ['A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','B','B','B']
values = [1,1,1,9,10,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,1]
df = pd.DataFrame({'box':box, 'values':values})
df

data transforms


### bin your data
bins = [0, 10, 25, 50, 100]
df['binned'] = pd.cut(df['values'], bins)
df_group = df.groupby(['box','binned'], as_index=False).count()

### calculate percentages per variable
df_group['percentage'] = df.groupby(['box', 'binned']).size().groupby(level=0).apply(lambda x: 100 * x / float(x.sum())).values
df_group

plot

### plot
px.bar(df_group, x="box", y="values", color="binned",text=df_group['percentage'].apply(lambda x: '{0:1.2f}%'.format(x)))