Search code examples
pythonmatplotlibseabornhistogramdensity-plot

Plot a histogram such that the total height equals 1


This is a follow-up question to this answer. I'm trying to plot normed histogram, but instead of getting 1 as maximum value on y axis, I'm getting different numbers.

For array k=(1,4,3,1)

 import numpy as np

 def plotGraph():
   
    import matplotlib.pyplot as plt
    
    k=(1,4,3,1)

    plt.hist(k, normed=1)

    from numpy import *
    plt.xticks( arange(10) ) # 10 ticks on x axis

    plt.show()  
    
plotGraph()

I get this histogram, that doesn't look like normed.

enter image description here

For a different array k=(3,3,3,3)

 import numpy as np

 def plotGraph():
   
    import matplotlib.pyplot as plt
    
    k=(3,3,3,3)

    plt.hist(k, normed=1)

    from numpy import *
    plt.xticks( arange(10) ) # 10 ticks on x axis

    plt.show()  
    
plotGraph()

I get this histogram with max y-value is 10.

enter image description here

For different k I get different max value of y even though normed=1 or normed=True.

Why the normalization (if it works) changes based on the data and how can I make maximum value of y equals to 1?

UPDATE:

I am trying to implement Carsten König answer from plotting histograms whose bar heights sum to 1 in matplotlib and getting very weird result:

import numpy as np

def plotGraph():

    import matplotlib.pyplot as plt

    k=(1,4,3,1)

    weights = np.ones_like(k)/len(k)
    plt.hist(k, weights=weights)

    from numpy import *
    plt.xticks( arange(10) ) # 10 ticks on x axis

    plt.show()  

plotGraph()

Result:

enter image description here

What am I doing wrong?


Solution

  • When plotting a normalized histogram, the area under the curve should sum to 1, not the height.

    In [44]:
    
    import matplotlib.pyplot as plt
    k=(3,3,3,3)
    x, bins, p=plt.hist(k, density=True)  # used to be normed=True in older versions
    from numpy import *
    plt.xticks( arange(10) ) # 10 ticks on x axis
    plt.show()  
    In [45]:
    
    print bins
    [ 2.5  2.6  2.7  2.8  2.9  3.   3.1  3.2  3.3  3.4  3.5]
    

    Here, this example, the bin width is 0.1, the area underneath the curve sums up to one (0.1*10).

    x stores the height for each bins. p stores each of those individual bins objects (actually, they are patches. So we just sum up x and modify the height of each bin object.

    To have the sum of height to be 1, add the following before plt.show():

    for item in p:
        item.set_height(item.get_height()/sum(x))
    

    enter image description here