python entropy information-theory scipy.stats

3 functions for computing relative entropy in scipy. What's the difference?

Scipy in python offers the following functions that seem to compute the same information theory measure, Kullback-Leibler divergence, which is also called relative entropy:

scipy.stats.entropy, which can be switched to computing KL-divergence if qk=None
scipy.special.rel_entr
scipy.special.kl_div

Why three of the same thing? Could someone explain the difference between them?

Solution

The default option for computing KL-divergence between discrete probability vectors would be scipy.stats.entropy.

In contrast, both scipy.special.rel_entr and scipy.special.kl_div are "element-wise functions" that can be used in conjunction with the usual array operations, and have to be summed before they yield the aggregate relative entropy value.

While both result in the same sum (when used with proper probability vectors the elements of which sum to 1), the second variant (scipy.special.kl_div) is different element-wise in that it adds -x +y terms, i.e.,

(x log(x/y)) - x + y

which cancel out in the sum.

For example

from numpy import array
from scipy.stats import entropy
from scipy.special import rel_entr, kl_div

p = array([1/2, 1/2])
q = array([1/10, 9/10])

print(entropy(p, q))
print(rel_entr(p, q), sum(rel_entr(p, q)))
print(kl_div(p, q), sum(kl_div(p, q)))

yields

0.5108256237659907
[ 0.80471896 -0.29389333] 0.5108256237659907
[0.40471896 0.10610667] 0.5108256237659906

I am not familiar with the rationale behind the element-wise extra-terms of scipy.special.kl_div but the documentation points to a reference that might explain more.

See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.kl_div.html#scipy.special.kl_div