I have an example of a histogram with:
mu1 = 10, sigma1 = 10
s1 = np.random.normal(mu1, sigma1, 100000)
and calculated
hist1 = np.histogram(s1, bins=50, range=(-10,10), density=True)
for i in hist1[0]:
ent = -sum(i * log(abs(i)))
print (ent)
Now I want to find the entropy from the given histogram array, but since np.histogram returns two arrays, I'm having trouble calculating the entropy. How can I just call on first array of np.histogram and calculate entropy? I would also get math domain error for the entropy even if my code above is correct. :(
**Edit: How do I find entropy when Mu = 0? and log(0) yields math domain error?
So the actual code I'm trying to write is:
mu1, sigma1 = 0, 1
mu2, sigma2 = 10, 1
s1 = np.random.normal(mu1, sigma1, 100000)
s2 = np.random.normal(mu2, sigma2, 100000)
hist1 = np.histogram(s1, bins=100, range=(-20,20), density=True)
data1 = hist1[0]
ent1 = -(data1*np.log(np.abs(data1))).sum()
hist2 = np.histogram(s2, bins=100, range=(-20,20), density=True)
data2 = hist2[0]
ent2 = -(data2*np.log(np.abs(data2))).sum()
So far, the first example ent1 would yield nan, and the second, ent2, yields math domain error :(
You can calculate the entropy using vectorized code:
import numpy as np
mu1 = 10
sigma1 = 10
s1 = np.random.normal(mu1, sigma1, 100000)
hist1 = np.histogram(s1, bins=50, range=(-10,10), density=True)
data = hist1[0]
ent = -(data*np.log(np.abs(data))).sum()
# output: 7.1802159512213191
But if you like to use a for loop, you may write:
import numpy as np
import math
mu1 = 10
sigma1 = 10
s1 = np.random.normal(mu1, sigma1, 100000)
hist1 = np.histogram(s1, bins=50, range=(-10,10), density=True)
ent = 0
for i in hist1[0]:
ent -= i * math.log(abs(i))
print (ent)
# output: 7.1802159512213191