Search code examples
pythonpandashistogram

Plotting histograms from grouped data in a pandas DataFrame


How do I plot a block of histograms from a group of data in a dataframe? For example, given:

from pandas import DataFrame
import numpy as np
x = ['A']*300 + ['B']*400 + ['C']*300
y = np.random.randn(1000)
df = DataFrame({'Letter': x, 'N': y})

I tried:

df.groupby('Letter').hist()

...which failed with the error message:

TypeError: cannot concatenate 'str' and 'float' objects


Solution

  • I'm on a roll, just found an even simpler way to do it using the by keyword in the hist method:

    df.hist('N', by='Letter')
    

    That's a very handy little shortcut for quickly scanning your grouped data!

    For future visitors, the product of this call is the following chart:

    output of hist plot command

    In answer to questions below, here's an example of specific tailoring of the histogram plots:

    # import libraries
    import pandas as pd
    import numpy as np
    
    # Create test dataframe
    x = ['A']*300 + ['B']*400 + ['C']*300
    y = np.random.randn(1000)
    z = np.random.randn(1000)
    df = pd.DataFrame({'Letter':x, 'N1':y, 'N2':z})
    
    # Plot histograms
    axes = df.hist(['N1','N2'], by='Letter',bins=10, layout=(2,2),
                   legend=True, yrot=90,sharex=True,sharey=True, 
                   log=True, figsize=(6,6))
    for ax in axes.flatten():
        ax.set_xlabel('N')
        ax.set_ylabel('Count')
        ax.set_ylim(bottom=1,top=100)
    

    enter image description here