Search code examples
pandasplotlydata-visualizationhistogram

Stacked histogram with percentage annotations


I have a pandas dataframe that looks like this

    Box Value
0   A   90
1   B   16
2   B   26
3   B   36
4   C   92
5   A   12
6   B   14
7   C   84
8   A   27
9   B   11

I want to create a Histogram(stacked/grouped) in plotly. And I want to display the percentage of total values per box falling into a certain bin. For example, if box A contains 5 values in the bin 0-10 and a total of 20 values in A, then the histogram for box A should display 25%, and so on.

Note: I don't want the percentage based on total values falling into a certain bin for all boxes, rather I want to display percentage per box in a particular bin.

What changes do I need to make here?

px.histogram(sample_data,
                   color="Box",
                   nbins=10,
                   barmode="group",
                   range_x=[0, 100],
                  )

Solution

  • imports

    import pandas as pd
    import plotly.express as px
    

    data

    ###dummy data
    box = ['A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','B','B','B']
    values = [1,1,1,9,10,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,1]
    df = pd.DataFrame({'box':box, 'values':values})
    df
    

    enter image description here


    data transforms

    
    ### bin your data
    bins = [0, 10, 25, 50, 100]
    df['binned'] = pd.cut(df['values'], bins)
    df_group = df.groupby(['box','binned'], as_index=False).count()
    
    ### calculate percentages per variable
    df_group['percentage'] = df.groupby(['box', 'binned']).size().groupby(level=0).apply(lambda x: 100 * x / float(x.sum())).values
    df_group
    

    enter image description here


    plot

    ### plot
    px.bar(df_group, x="box", y="values", color="binned",text=df_group['percentage'].apply(lambda x: '{0:1.2f}%'.format(x)))
    

    enter image description here