I am using sklearn
KernelDensity
function to estimate density and then evaluate pdf at some points using score_samples
function but the values returned by the score_samples
function are much greater than 0 which shouldn't be the case because as per documentation it returns log(density)
[Documentation: The array of log(density) evaluations. These are normalized to be probability densities, so values will be low for high-dimensional data.]
from sklearn.neighbors.kde import KernelDensity
import numpy as np
data = np.random.normal(0, 1, [50, 10]) #50 data points, dimension=10
data_kde = KernelDensity(kernel="gaussian", bandwidth=0.2).fit(data)
output = data_kde.score_samples(data)
#print(output)
output = array([19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645])
Since density lies in [0, 1], log(density)
should be between (-Inf, 0]
unlike 19.9448
shown above.
probability densities don't have to be between [0,1]. They are densities and not an exact probability. The Wikipedia page gives a good overview of pdfs.