Search code examples
rnumerical-integrationkernel-density

Computing the KL divergence with kernel density estimates


I have two vectors of varying length, and I want to compute the DKL from the density estimates using the density() function in R.

The equation of the DKL is the following.

enter image description here

I think I can use numerical integration, say

kde1 = density(x)
kde2 = density(y)
f1 = approxfun(kde1$x,kde1$y,rule=2)
f2 = approxfun(kde2$x,kde2$y,rule=2)
kde_f = function(f1,f2){
  f1 * log2(f1/f2)
}

Then integrate over kde_f, e.g.

integrate(f = kde_f,lower=0, upper=100)

Of course, this doesn't work, but I wrote this as the main idea of what I want to do. I don't have idea of how to proceed, or even if this have sense. Any help will be really appreciated.


Solution

  • I came to this solution

    kld_base = function(x,y,...){
      integrand = function(x,y,t){
        f.x =  approx(density(x)$x,density(x)$y,t)$y
        f.y =  approx(density(y)$x,density(y)$y,t)$y
        tmpRatio = f.x *(log2(f.x) - log2(f.y))
        tmpRatio = ifelse(is.infinite(tmpRatio),0,ifelse(is.na(tmpRatio),0,tmpRatio))
        return(tmpRatio)
      }
      return(integrate(integrand,-Inf,Inf,x = x,y = y,stop.on.error=FALSE)$value)
    }
    
    set.seed(13)
    x = rnorm(100)
    y = rnorm(100)
    kld_base(x,y)
    # [1] 0.06990757
    

    I will let the question open for a while, if someone have a better solution than mine please feel free to comment.