Search code examples
pythonkernelscipybandwidth

Getting bandwidth used by SciPy's gaussian_kde function


I'm using SciPy's stats.gaussian_kde function to generate a kernel density estimate (kde) function from a data set of x,y points.

This is a simple MWE of my code:

import numpy as np
from scipy import stats

def random_data(N):
    # Generate some random data.
    return np.random.uniform(0., 10., N)

# Data lists.
x_data = random_data(100)
y_data = random_data(100)

# Obtain the gaussian kernel.
kernel = stats.gaussian_kde(np.vstack([x_data, y_data]))

Since I'm not setting a bandwidth manually (via the bw_method key), the function defaults to using Scott's rule (see function's description). What I need is to obtain this bandwidth value set automatically by the stats.gaussian_kde function.

I've tried using:

print kernel.set_bandwidth()

but it always returns None instead of a float.


Solution

  • Short answer

    The bandwidth is kernel.covariance_factor() multiplied by the std of the sample that you are using.

    (This is in the case of 1D sample and it is computed using Scott's rule of thumb in the default case).

    Example:

    from scipy.stats import gaussian_kde
    sample = np.random.normal(0., 2., 100)
    kde = gaussian_kde(sample)
    f = kde.covariance_factor()
    bw = f * sample.std()
    

    The pdf that you get is this:

    from pylab import plot
    x_grid = np.linspace(-6, 6, 200)
    plot(x_grid, kde.evaluate(x_grid))
    

    enter image description here

    You can check it this way, If you use a new function to create a kde using, say, sklearn:

    from sklearn.neighbors import KernelDensity
    def kde_sklearn(x, x_grid, bandwidth):
        kde_skl = KernelDensity(bandwidth=bandwidth)
        kde_skl.fit(x[:, np.newaxis])
        # score_samples() returns the log-likelihood of the samples
        log_pdf = kde_skl.score_samples(x_grid[:, np.newaxis])
        pdf = np.exp(log_pdf)
        return pdf
    

    Now using the same code from above you get:

    plot(x_grid, kde_sklearn(sample, x_grid, f))
    

    enter image description here

    plot(x_grid, kde_sklearn(sample, x_grid, bw))
    

    enter image description here