Search code examples

sklearn KernelDensity score_samples giving values greater than 0

I am using sklearn KernelDensity function to estimate density and then evaluate pdf at some points using score_samples function but the values returned by the score_samples function are much greater than 0 which shouldn't be the case because as per documentation it returns log(density) [Documentation: The array of log(density) evaluations. These are normalized to be probability densities, so values will be low for high-dimensional data.]

from sklearn.neighbors.kde import KernelDensity
import numpy as np

data = np.random.normal(0, 1, [50, 10]) #50 data points, dimension=10
data_kde = KernelDensity(kernel="gaussian", bandwidth=0.2).fit(data)
output = data_kde.score_samples(data)

output = array([19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645])

Since density lies in [0, 1], log(density) should be between (-Inf, 0] unlike 19.9448 shown above.


  • probability densities don't have to be between [0,1]. They are densities and not an exact probability. The Wikipedia page gives a good overview of pdfs.