Search code examples
pythonnumpyhistogramentropy

how to calculate entropy from np histogram


I have an example of a histogram with:

mu1 = 10, sigma1 = 10
s1 = np.random.normal(mu1, sigma1, 100000)

and calculated

hist1 = np.histogram(s1, bins=50, range=(-10,10), density=True)
for i in hist1[0]:
    ent = -sum(i * log(abs(i)))
print (ent)

Now I want to find the entropy from the given histogram array, but since np.histogram returns two arrays, I'm having trouble calculating the entropy. How can I just call on first array of np.histogram and calculate entropy? I would also get math domain error for the entropy even if my code above is correct. :(

**Edit: How do I find entropy when Mu = 0? and log(0) yields math domain error?


So the actual code I'm trying to write is:

mu1, sigma1 = 0, 1
mu2, sigma2 = 10, 1
s1 = np.random.normal(mu1, sigma1, 100000)
s2 = np.random.normal(mu2, sigma2, 100000)

hist1 = np.histogram(s1, bins=100, range=(-20,20), density=True)
data1 = hist1[0]
ent1 = -(data1*np.log(np.abs(data1))).sum() 

hist2 = np.histogram(s2, bins=100, range=(-20,20), density=True)
data2 = hist2[0]
ent2 = -(data2*np.log(np.abs(data2))).sum() 

So far, the first example ent1 would yield nan, and the second, ent2, yields math domain error :(


Solution

  • You can calculate the entropy using vectorized code:

    import numpy as np
    
    mu1 = 10
    sigma1 = 10
    
    s1 = np.random.normal(mu1, sigma1, 100000)
    hist1 = np.histogram(s1, bins=50, range=(-10,10), density=True)
    data = hist1[0]
    ent = -(data*np.log(np.abs(data))).sum()
    # output: 7.1802159512213191
    

    But if you like to use a for loop, you may write:

    import numpy as np
    import math
    
    mu1 = 10
    sigma1 = 10
    
    s1 = np.random.normal(mu1, sigma1, 100000)
    hist1 = np.histogram(s1, bins=50, range=(-10,10), density=True)
    ent = 0
    for i in hist1[0]:
        ent -= i * math.log(abs(i))
    print (ent)
    # output: 7.1802159512213191