Search code examples
python-3.xscikit-learngaussiankernel-density

sklearn KernelDensity score_samples giving values greater than 0


I am using sklearn KernelDensity function to estimate density and then evaluate pdf at some points using score_samples function but the values returned by the score_samples function are much greater than 0 which shouldn't be the case because as per documentation it returns log(density) [Documentation: The array of log(density) evaluations. These are normalized to be probability densities, so values will be low for high-dimensional data.]

from sklearn.neighbors.kde import KernelDensity
import numpy as np

data = np.random.normal(0, 1, [50, 10]) #50 data points, dimension=10
data_kde = KernelDensity(kernel="gaussian", bandwidth=0.2).fit(data)
output = data_kde.score_samples(data)

#print(output)
output = array([19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645])

Since density lies in [0, 1], log(density) should be between (-Inf, 0] unlike 19.9448 shown above.


Solution

  • probability densities don't have to be between [0,1]. They are densities and not an exact probability. The Wikipedia page gives a good overview of pdfs.

    https://en.wikipedia.org/wiki/Probability_density_function