Search code examples
pythonhistogramgaussiansmoothing

KL ( Kullback-Leibler) distance with histogram smoothing in Python


I have two lists ( of different lengths) of numbers. Using Python, I want to calculate histograms with say 10 bins. Then I want to smooth these two histograms with Standard kernel (gaussian kernel with mean = 0 ,sigma=1) Then I want to calculate the KL distance between these 2 smoothed histograms. I found some code about histogram calculation but no sure about how to apply standard kernel for smoothening and then how to calculate the KL distance. Please help.


Solution

  • For calculating histograms you can use numpy.histogram() and for gaussian smoothing scipy.ndimage.filters.gaussian_filter(). Kullback-Leibler divergence code can be found here.

    Method to calculate do the required calculation would look something like this:

    import numpy as np
    from scipy.ndimage.filters import gaussian_filter
    
    def kl(p, q):
        p = np.asarray(p, dtype=np.float)
        q = np.asarray(q, dtype=np.float)
    
        return np.sum(np.where(p != 0, p * np.log(p / q), 0))
    
    def smoothed_hist_kl_distance(a, b, nbins=10, sigma=1):
        ahist, bhist = (np.histogram(a, bins=nbins)[0],
                        np.histogram(b, bins=nbins)[0])
    
        asmooth, bsmooth = (gaussian_filter(ahist, sigma),
                            gaussian_filter(bhist, sigma))
    
        return kl(asmooth, bsmooth)