Search code examples
pythonpandasmatplotlibjupyter-notebookpie-chart

How to create a subplot for each group of a pandas column


In titanic dataset, I need to create a chart that shows the percentage of passengers for all class who survived. Also it should have three pie charts. class 1 survived and not-survived, class 2 survived and not-survived, class 3.

How can make this happen? I already tried this type of code but it produces wrong values.

import pandas as pd
import seaborn as sns  # for dataset

df_titanic = sns.load_dataset('titanic')

   survived  pclass     sex   age  sibsp  parch     fare embarked  class    who  adult_male deck  embark_town alive  alone
0         0       3    male  22.0      1      0   7.2500        S  Third    man        True  NaN  Southampton    no  False
1         1       1  female  38.0      1      0  71.2833        C  First  woman       False    C    Cherbourg   yes  False
2         1       3  female  26.0      0      0   7.9250        S  Third  woman       False  NaN  Southampton   yes   True

c1s = len(df_titanic[(df_titanic.pclass==1) & (df_titanic.survived==1)].value_counts())
c2ns = len(df_titanic[(df_titanic.pclass==1) & (df_titanic.survived==0)].value_counts())

this code produce true values but I need that in 3 pie chart

df_titanic.groupby(['pclass' ,'survived']).size().plot(kind='pie', autopct='%.2f')

enter image description here

class: 1,2,3 survived: 0,1


Solution

    1. The correct way to get subplots using pandas, is to reshape the dataframe. pandas.crosstab is used to shape the dataframe
    2. Then plot using pandas.DataFrame.plot with kind='pie' and subplots=True.
    • Extra code has been added for formatting
      • rotating the pclass label
      • plot title
      • custom legend instead of a legend for each subplot
        • specify the labels for the legend
        • specify colors for for the number of labels
    • Tested in python 3.8.12, pandas 1.3.4, matplotlib 3.4.3
    import seaborn as sns  # for titanic data only
    import pandas as pd
    from matplotlib.patches import Patch  # to create the colored squares for the legend
    
    # load the dataframe
    df = sns.load_dataset('titanic')
    
    # reshaping the dataframe is the most important step
    ct = pd.crosstab(df.survived, df.pclass)
    
    # display(ct)
    pclass      1   2    3
    survived              
    0          80  97  372
    1         136  87  119
    
    # plot and add labels
    colors = ['tab:blue', 'tab:orange']  # specify the colors so they can be used in the legend
    labels = ["not survived", "survived"]  # used for the legend
    axes = ct.plot(kind='pie', autopct='%.1f%%', subplots=True, figsize=(12, 5),
                   legend=False, labels=['', ''], colors=colors)
    
    # flatten the array of axes
    axes = axes.flat
    
    # extract the figure object
    fig = axes[0].get_figure()
    
    # rotate the pclass label
    for ax in axes:
        yl = ax.get_ylabel()
        ax.set_ylabel(yl, rotation=0, fontsize=12)
        
    # create the legend
    legend_elements = [Patch(fc=c, label=l) for c, l in zip(colors, labels)]
    fig.legend(handles=legend_elements, loc=9, fontsize=12, ncol=2, borderaxespad=0, bbox_to_anchor=(0., 0.8, 1, .102), frameon=False)
    
    fig.tight_layout()
    fig.suptitle('pclass survival', fontsize=15)
    

    Formatted Figure

    enter image description here

    Unformatted Figure

    axes = ct.plot(kind='pie', autopct='%.1f%%', subplots=True, figsize=(12, 5), labels=["not survived", "survived"])
    

    enter image description here