Search code examples
rgaussiankernel-densityprobability-density

Discrepancies in the density() kernel estimator compared to calculations by scratch


I am trying to calculate the Gaussian kernel density, and to test of my knowledge of the density() function, I decided to calculate it from scratch and compare the two results.

However, they do not provide the same answer.

I start with an existing dataset

xi <- mtcars$mpg

and can plot the kernel density of this data, as follows

plot(density(xi, kernel = "gaussian"))

which provides this...

Automated gaussian kernel density

I then grab some of the details from this calculation, so that my calculation is consistent.

auto.dens <- density(xi, kernel = "gaussian")
h <- auto.dens$bw # bandwidth for kernel
x0 <- auto.dens$x # points for prediction

I then calculate the gaussian kernel density myself, and I have done this in a loop, just so it is clearer to read.

fx0 <- NULL

for (j in 1:length(x0)){

    t <- abs(x0[j]-xi)/h

    K <- (1/sqrt(2*pi))*exp(-(t^2)/2)

    fx0 <- c(fx0,sum(K*t)/(length(t)*h))
}

The basic calculation has been constructed following the details in section 3.3.6 in Statistical Methods in the Atmospheric Sciences, 3rd Edition, by Daniel Wilks. Equation 3.13 from Wilks textbook with the Gaussian kernel set as enter image description here and t being enter image description here

However, and here is my problem.

I then plot the two together...

plot(y=fx0,x=x0, type="l", ylim=c(0,0.07))
lines(x=auto.dens$x, y=auto.dens$y, col="red")

The output from the density function (red), and my calculations (black), I get enter image description here

!These two calculations are clearly different!

Have I miss understood how the density function works? Why can't I manage to calcualte the same results from scratch? Why is my kernel estimator providing different results? Why are my results less smooth?

I need to construct and apply a kernel smoother (not just of density) to a much more complicated dataset, and only did this little example to make sure I was doing the same as the the automated functions, and really wasn't expecting the have this problem. I've tried all kinds of things, and just cannot see why I get a different result.

Thank you all in advance, for reading and any comments, little or big.

Edit: 13:40 29/11/2016 Solution as detailed in answer below enter image description here


Solution

  • You don't need to sum(K*t), just sum(K).

    xi <- mtcars$mpg
    plot(density(xi, kernel = "gaussian"), lwd = 2)
    
    auto.dens <- density(xi, kernel = "gaussian")
    h <- auto.dens$bw # bandwidth for kernel
    x0 <- auto.dens$x # points for prediction
    
    fx0 <- NULL
    for (j in 1:length(x0)) {
      t <- abs(x0[j]-xi)/h
      K <- (1/sqrt(2*pi))*exp(-(t^2)/2)
      fx0 <- c(fx0, sum(K)/(length(t)*h))
    }
    
    lines(x0, fx0, col = "red", lty = "dotted")