Search code examples
pythonpandasentropy

Shannon's Entropy on an array containing zero's


I use the following code to return Shannon's Entropy on an array that represents a probability distribution.

A = np.random.randint(10, size=10)

pA = A / A.sum()
Shannon2 = -np.sum(pA*np.log2(pA))

This works fine if the array doesn't contain any zero's.

Example:

Input: [2 3 3 3 2 1 5 3 3 4]
Output: 3.2240472715

However, if the array does contain zero's, Shannon's Entropy produces nan

Example:

Input:[7 6 6 8 8 2 8 3 0 7]
Output: nan

I do get two RuntimeWarnings:

1) RuntimeWarning: divide by zero encountered in log2

2) RuntimeWarning: invalid value encountered in multiply

Is there a way to alter the code to include zero's? I'm just not sure if removing them completely will influence the result. Specifically, if the variation would be greater due to the greater frequency in distribution.


Solution

  • I think you want to use nansum to count nans as zero:

    A = np.random.randint(10, size=10)
    pA = A / A.sum()
    Shannon2 = -np.nansum(pA*np.log2(pA))