Search code examples
pythonpandasdataframematplotlib

Stacked bar plot using matplotlib and pandas dataframe


I have some data:

df = pd.DataFrame({
    'Plan': [40, 50, 60, 25],
    'Fact': [10, 20, 30, 15],
    'financing_type': ['type_1', 'type_2', 'type_1', 'type_3']
})

And I need to plot two bars with different colors depend on sum for financing_type
Exactly like this:
enter image description here

I did it by this way:

df_type_1 = df[df['financing_type'] == 'type_1']
df_type_2 = df[df['financing_type'] == 'type_2']
df_type_3 = df[df['financing_type'] == 'type_3']

plt.bar(['Plan', 'Fact'], [df_type_1['Plan'].sum(), df_type_1['Fact'].sum()], color='blue', label='type_1')
plt.bar(
    ['Plan', 'Fact'],
    [df_type_2['Plan'].sum(), df_type_2['Fact'].sum()], 
    bottom=[df_type_1['Plan'].sum(), df_type_1['Fact'].sum()], 
    color='red', 
    label='type_2',
)
plt.bar(
    ['Plan', 'Fact'],
    [df_type_3['Plan'].sum(), df_type_3['Fact'].sum()], 
    bottom=[df_type_1['Plan'].sum() + df_type_2['Plan'].sum(), df_type_1['Fact'].sum() + df_type_2['Fact'].sum()], 
    color='green', 
    label='type_3',
)
plt.legend()
plt.show()

How can I do it for the more common case? If I don't know how many different types in the column financing_type.


Solution

  • Here is an approach:

    • Melt the 'Plan' and 'Fact' columns to create a long form dataframe
    • Create a pivot_table, summing the values for each type
    • Create a stacked bar plot from the pivot table
    import pandas as pd
    
    # Given a dataframe
    df = pd.DataFrame({
        'Plan': [40, 50, 60, 25],
        'Fact': [10, 20, 30, 15],
        'financing_type': ['type_1', 'type_2', 'type_1', 'type_3']})
    
    # Melt the DataFrame
    df_melted = df.melt(id_vars=['financing_type'], var_name='Category', value_name='Value')
    
    # Pivot the dataFrame to get the sum of 'Plan' and 'Fact' for each 'financing_type'
    df_pivot = df_melted.pivot_table(index='Category', columns='financing_type', values='Value', aggfunc='sum')
    
    # Reorder the index of the pivoted dataframe
    df_pivot = df_pivot.reindex(['Plan', 'Fact'])
    
    # Create a stacked bar plot
    df_pivot.plot.bar(stacked=True, rot=0, xlabel='')
    

    pandas: stacked barplot from summed values

    Alternatively, you can use seaborn to create a stacked, weighted histogram:

    import seaborn as sns
    import pandas as pd
    
    # Given a dataframe
    df = pd.DataFrame({
        'Plan': [40, 50, 60, 25],
        'Fact': [10, 20, 30, 15],
        'financing_type': ['type_1', 'type_2', 'type_1', 'type_3']})
    
    # Melt the DataFrame
    df_melted = df.melt(id_vars=['financing_type'], var_name='Category', value_name='Value')
    
    # Create a stacked, weighted histogram 
    sns.histplot(df_melted, x='Category', hue='financing_type', weights='Value', multiple='stack', alpha=1)