Search code examples
pandasmatplotlibhistogram

Pandas histogram df.hist() group by


How to plot a histogram with pandas DataFrame.hist() using group by? I have a data frame with 5 columns: "A", "B", "C", "D" and "Group"

There are two Groups classes: "yes" and "no"

Using:

df.hist() 

I get the hist for each of the 4 columns.

enter image description here

Now I would like to get the same 4 graphs but with blue bars (group="yes") and red bars (group = "no").

I tried this withouth success:

df.hist(by = "group")

pandas hist went wrong


Solution

  • This is not the most flexible workaround but will work for your question specifically.

    def sephist(col):
        yes = df[df['group'] == 'yes'][col]
        no = df[df['group'] == 'no'][col]
        return yes, no
    
    for num, alpha in enumerate('abcd'):
        plt.subplot(2, 2, num)
        plt.hist(sephist(alpha)[0], bins=25, alpha=0.5, label='yes', color='b')
        plt.hist(sephist(alpha)[1], bins=25, alpha=0.5, label='no', color='r')
        plt.legend(loc='upper right')
        plt.title(alpha)
    plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0)
    

    enter image description here

    You could make this more generic by:

    • adding a df and by parameter to sephist: def sephist(df, by, col)
    • making the subplots loop more flexible: for num, alpha in enumerate(df.columns)

    Because the first argument to matplotlib.pyplot.hist can take

    either a single array or a sequency of arrays which are not required to be of the same length

    ...an alternattive would be:

    for num, alpha in enumerate('abcd'):
        plt.subplot(2, 2, num)
        plt.hist((sephist(alpha)[0], sephist(alpha)[1]), bins=25, alpha=0.5, label=['yes', 'no'], color=['r', 'b'])
        plt.legend(loc='upper right')
        plt.title(alpha)
    plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0)
    

    enter image description here