Search code examples
pythonpandasmatplotlibplothistogram

Create histogram for grouped column


How can I create a plot with one row and three columns where in each column I plot a histogram? The data comes from this DataFrame:

import pandas as pd
import matplotlib as plt
d = {'col1': ['A','A','A','A','A','A','B','B','B','B','B','B','C','C','C','C','C','C'], 
     'col2': [3, 4, 3, 4, 6, 7, 8, 9, 3, 2, 3, 4, 5, 3, 4, 1, 2, 6 ]}
df = pd.DataFrame(data=d)

In the DataFrame we have three groups (A,B,C) but I could have N groups and I still want to have one graph with one row and each column is a histogram for each group.


Solution

  • You can pivot your data frame and chain the plot command to produce the figure.

    import pandas as pd
    import matplotlib.pyplot as plt
    
    d = {'Category': ['A','A','A','A','A','A','B','B','B','B','B','B','C','C','C','C','C','C'], 
         'Values': [3, 4, 3, 4, 6, 7, 8, 9, 3, 2, 3, 4, 5, 3, 4, 1, 2, 2 ]}
    df = pd.DataFrame(d)
    
    df.pivot(columns='Category', values='Values').plot(kind='hist', subplots=True, rwidth=0.9, align='mid')
    

    enter image description here

    Edit: You can use the code below to plot all subplots in one row. However, for more than three categories the plots start looking very squashed.

    df2 = df.pivot(columns='Category', values='Values')
    color = ['blue', 'green', 'red']
    idx = np.arange(1, 4)
    plt.subplots(1, 3)
    for i, col, colour in zip(idx, df2.columns, color):
        plt.subplot(1, 3, i)
        df2.loc[:, col].plot.hist(label=col, color=colour, range=(df['Values'].min(), df['Values'].max()), bins=11)
        plt.yticks(np.arange(3))
        plt.legend()
    

    enter image description here