Search code examples
python-3.xpandasdataframematplotlibhistogram

Creating Histogram with Additional Data Element


I have a dataframe with quantities and values of different categories summarized. I need to visualize this to show how many categories are under different groups of quantities and what is the value they have earlier in summation.

Sample dataframe to use:

df = pd.DataFrame({'cat': ['A','B','C','D','E','F','G','H','I','J'],
                   'count': [5,10,50,20,3,18,28,93,42,31],
                   'value': [100,245,890,510,85,690,730,2470,1870,1180],
                  })

I created the histogram for counts using this:

df.plot(kind='hist',y='count',bins=[0,20,40,60,80,100])

This will show me the distribution of 'cat' in different groups (classes) of 'count' variable.

Now, for each such class, I need to have a total of 'value' visualized on the same chart. Either just the sum shown as a number against each histogram bar or a line with secondary y-axis on the right of the same chart (axes).

This will enable me to show that categories having count of (say) 0-20 have earned value in total of 1220. [value(A+B+E+F)]

Also, you may suggest if instead of histogram, I should be using some other chart to visualize this statement better.


Solution

  • I used the pandas.cut() method to create bins manually and generated another dataframe which was aggregate of the earlier one.

    This is the closest that I could come up with. But I still do not get a clear visualization of what I want to show.

    df['Bins'] = pd.cut(df['count'],bins=range(0,70,10))
    df1 = df.groupby('Bins').agg({'Bins':'count','value':'sum'})
    df1.plot(kind='bar',subplots=True,figsize=(15,8))
    plt.show()