Search code examples
pythonmatplotlibhistogram

How to plot a histogram in matplotlib in python?


I know how to plot a histogram when individual datapoints are given like: (33, 45, 54, 33, 21, 29, 15, ...)

by simply using something matplotlib.pyplot.hist(x, bins=10)

but what if I only have grouped data like:

| Marks    |Number of students |
| -------- | ----------------- |
| 0-10    | 8               |
| 10-20  | 12           |
|  20-30       |    24         |
|  30-40       |    26         |
|  ......       | ......            | and so on.

I know that I can use bar plots to mimic a histogram by changing xticks but what if I want to do this by using only hist function of matplotlib.pyplot?

Is it possible to do this?


Solution

  • You can build the hist() params manually and use the existing value counts as weights.

    Say you have this df:

    >>> df = pd.DataFrame({'Marks': ['0-10', '10-20', '20-30', '30-40'], 'Number of students': [8, 12, 24, 26]})
       Marks  Number of students
    0   0-10                   8
    1  10-20                  12
    2  20-30                  24
    3  30-40                  26
    

    The bins are all the unique boundary values in Marks:

    >>> bins = pd.unique(df.Marks.str.split('-', expand=True).astype(int).values.ravel())
    array([ 0, 10, 20, 30, 40])
    

    Choose one x value per bin, e.g. the left edge to make it easy:

    >>> x = bins[:-1]
    array([ 0, 10, 20, 30])
    

    Use the existing value counts (Number of students) as weights:

    >>> weights = df['Number of students'].values
    array([ 8, 12, 24, 26])
    

    Then plug these into hist():

    >>> plt.hist(x=x, bins=bins, weights=weights)
    

    reconstructed histogram