My question is, how to return the mean value and variance of the estimated "kde"? Or is there any other package you known that can easily output mean value or variance value, like print kde.mean()
or print kde.get_parameter(mean)
?
import numpy as np
from scipy.stats import norm
from sklearn.neighbors import KernelDensity
N = 100
np.random.seed(1)
X = np.concatenate((np.random.normal(0, 1, int(0.3 * N)),np.random.normal(5, 1, int(0.7 * N))))[:, np.newaxis]
X_plot = np.linspace(-5, 10, 1000)[:, np.newaxis]
kde = KernelDensity(kernel='gaussian', bandwidth=0.5).fit(X)
In general, you need to do this numerically. I suggest 2 different approaches:
These approaches work for any kernel and any bandwidth.
Uses the fact that once we know the probability density function, we can easily compute mean and variance via integration.
Note that in scikit-learn
the method score_samples
returns log pdf and therefore one needs to "exp" it.
The idea here is to simply sample from your KDE and estimate population mean and variance via sample mean and variance.
import numpy as np
from scipy.integrate import quad
from sklearn.neighbors import KernelDensity
N = 100
np.random.seed(1)
X = np.concatenate((np.random.normal(0, 1, int(0.3 * N)),np.random.normal(5, 1, int(0.7 * N))))[:, np.newaxis]
X_plot = np.linspace(-5, 10, 1000)[:, np.newaxis]
kde = KernelDensity(kernel='gaussian', bandwidth=0.5).fit(X)
# Mean and Variance - Integration
pdf = lambda x : np.exp(kde.score_samples([[x]]))[0]
mean_integration = quad(lambda x: x * pdf(x), a=-np.inf, b=np.inf)[0]
variance_integration = quad(lambda x: (x ** 2) * pdf(x), a=-np.inf, b=np.inf)[0] - mean_integration ** 2
# Mean and Variance - Monte Carlo
n_samples = 10000000
samples = kde.sample(n_samples)
mean_mc = samples.mean()
variance_mc = samples.var()
print('Mean:\nIntegration: {}\nMonte Carlo: {}\n'.format(mean_integration, mean_mc))
print('Variance\nIntegration: {}\nMonte Carlo: {}\n'.format(variance_integration, variance_mc))
Output:
Mean: Integration: 3.560582852075697 Monte Carlo: 3.5595633705830934
Variance: Integration: 6.645066811078639 Monte Carlo: 6.646732489654485