Search code examples
pythonpandashistogramfrequency

Frequency Density Graph using pandas


I have a pandas data frame created as shown below

df_hist2 = pd.DataFrame({'Score':np.random.uniform(0,1,4300)}).append(
    pd.DataFrame({'Score':np.random.uniform(1,3,6900)}),ignore_index=True).append(
    pd.DataFrame({'Score':np.random.uniform(3,5,4900)}),ignore_index=True).append(
    pd.DataFrame({'Score':np.random.uniform(5,10,2000)}),ignore_index=True).append(
    pd.DataFrame({'Score':np.random.uniform(10,24,2100)}),ignore_index=True);

And I can create a histogram from it as shown below

df_hist2.plot.hist(bins=[0,1,3,5,10,24], edgecolor='black', linewidth=1.2)

And it is something like this

enter image description here

However, I want to create a histogram that shows the frequency density instead of just the frequencies where

Frequency Density = Frequency / Width of the bin

I can possibly plot a bar graph where I can create a category for each of the bins ('0-1', '1-3', etc.) and the calculate the densities manually. However, is there a more elegant and easy way to do this?

Moreover, doing it the bar graph way would require me to first calculate the frequencies from the data as well. (in this case I know since I am generating the data manually, but wouldn't know in case of real data)

What I would want is to have something that calculates and plots the following

Hours   | Frequency | Width | Density       |
--------------------------------------------|
0 - 1   | 4300      | 1     | 4300/1 = 4300 |
1 - 3   | 6900      | 2     | 6900/2 = 3450 |
3 - 5   | 4900      | 2     | 4900/2 = 2450 |
5 - 10  | 2000      | 5     | 2000/5 = 400  |
10 - 24 | 2100      | 14    | 2100/14 = 150 |

And a plot that looks similar to the following (done in excel with some manual editing)

Note: The width of the interval/bin is preserved. The height is changed to reflect the frequency.

enter image description here


Solution

  • Here's an example that may get you near what you want using the histogram parameter normed=1, which gives you the desired histogram shape. Scaling the heights of the histogram using matplotlib.pyplot as plt then gives you what you want:

    fig = df_hist2.plot.hist(bins=[0,1,3,5,10,24], edgecolor='black', linewidth=1.2,
                       normed=True) 
    plt.yticks(fig.get_yticks(), [int(x) for x in fig.get_yticks() * len(df_hist2)])
    

    You can further customize the exact values of the y ticks to your liking.

    enter image description here