Search code examples
pythoncomparisonseaborndistributioncategorical-data

Countplot with relative frequencies or density curves


I am trying to visualize categorical data for three groups (hues) of data. Using seaborn, it seems like countplot() may do the trick (the second example in the documentation link below looks like what I need). But instead of counts on the y-axis, is it possible to make this the proportion by group?

In the second example in the link, the Man group (blue bar) would be roughly 22%, 18%, 60% in first, second, and third class, respectively, rather than the counts. The same would be done for the Woman and Child groups.

Seaborn Example


Solution

  • As far as I know, this isn't an option directly in Seaborn, but you can manually create a proportional counts data set and plot with sns.barplot:

    df = sns.load_dataset('titanic')
    # [1] Simple count
    sns.countplot(x='class', data=df)
    plt.show()
    # [1B] By percent
    pct = df['class'].value_counts(normalize=True).reset_index().rename({'index':'class','class':'percent'}, axis=1)
    sns.barplot(x='class', y='percent', data=pct)
    plt.show()
    
    # [2] Two var count
    sns.countplot(x='class', hue='who', data=df)
    plt.show()
    # [2b] By percent
    pct2 = (df.groupby(['class','who']).size() / df.groupby(['class']).size()).reset_index().rename({0:'percent'}, axis=1)
    sns.barplot(x='class', hue='who', y='percent', data=pct2)
    plt.show()
    

    enter image description here

    enter image description here


    Edits per comment

    You can revise what percentage you're calculating fairly easily by changing the denominator of the fraction used to calculate the pct dataframe.

    # [3] Grouped by 'class'; hue by 'who'
    # IIUC, this is what you're asking for
    pct3 = (df.groupby(['class','who']).size() / df.groupby(['who']).size()).reset_index().rename({0:'percent'}, axis=1)
    sns.barplot(x='class', hue='who', y='percent', data=pct3)
    plt.show()
    

    enter image description here

    You can also change the groupings by swapping the hue and x arguments in the sns.boxplot command. In my view, this second option is a bit more intuitive.

    # [3b] Grouped by 'who'; hue by 'class'
    # In my view, this is a bit more intuitive; each grouping sums to 100%, 
    # and you can compare across class for men, women, and children more easily
    sns.barplot(x='who', hue='class', y='percent', data=pct3)
    plt.show()
    

    enter image description here