Search code examples
pythonarrayskernel-density

KDE in python with different mu, sigma / mapping a function to an array


I have a 2-dimensional array of values that I would like to perform a Gaussian KDE on, with a catch: the points are assumed to have different variances. For that, I have a second 2-dimensional array (with the same shape) that is the variance of the Gaussian to be used for each point. In the simple example,

import numpy as np
data = np.array([[0.4,0.2],[0.1,0.5]])
sigma = np.array([[0.05,0.1],[0.02,0.3]])

there would be four gaussians, the first of which is centered at x=0.4 with σ=0.05. Note: Actual data is much larger than 2x2

I am looking for one of two things:

  1. A Gaussian KDE solver that will allow for bandwidth to change for each point

or

  1. A way to map the results of each Gaussian into a 3-dimensional array, with each Gaussian evaluated across a range of points (say, evaluate each center/σ pair along np.linspace(0,1,101)). In this case, I could e.g. have the KDE value at x=0.5 by taking outarray[:,:,51].

Solution

  • The best way I found to handle this is through array multiplication of a sigma array and a data array. Then, I stack the arrays for each value I want to solve the KDE for.

    import numpy as np
    
    def solve_gaussian(val,data_array,sigma_array):
        return (1. / sigma_array) * np.exp(- (val - data_array) * (val - data_array) / (2 * sigma_array * sigma_array))
    
    def solve_kde(xlist,data_array,sigma_array):
        kde_array = np.array([])
        for xx in xlist:
            single_kde = solve_gaussian(xx,data_array,sigma_array)
            if np.ndim(kde_array) == 3:
                kde_array = np.concatenate((kde_array,single_kde[np.newaxis,:,:]),axis=0)
            else:
                kde_array = np.dstack(single_kde)
        return kde_array
    
    xlist = np.linspace(0,1,101) #Adjust as needed
    kde_array = solve_kde(xlist,data_array,sigma_array)
    kde_vector = np.sum(np.sum(kde_array,axis=2),axis=1)
    mode_guess = xlist[np.argmax(kde_vector)]
    

    Caveat, for anyone attempting to use this code: the value of the Gaussian is along axis 0, not axis 2 as specified in the original question.