I am trying to calculate the Gaussian kernel density, and to test of my knowledge of the density()
function, I decided to calculate it from scratch and compare the two results.
However, they do not provide the same answer.
I start with an existing dataset
xi <- mtcars$mpg
and can plot the kernel density of this data, as follows
plot(density(xi, kernel = "gaussian"))
which provides this...
I then grab some of the details from this calculation, so that my calculation is consistent.
auto.dens <- density(xi, kernel = "gaussian")
h <- auto.dens$bw # bandwidth for kernel
x0 <- auto.dens$x # points for prediction
I then calculate the gaussian kernel density myself, and I have done this in a loop, just so it is clearer to read.
fx0 <- NULL
for (j in 1:length(x0)){
t <- abs(x0[j]-xi)/h
K <- (1/sqrt(2*pi))*exp(-(t^2)/2)
fx0 <- c(fx0,sum(K*t)/(length(t)*h))
}
The basic calculation has been constructed following the details in section 3.3.6 in Statistical Methods in the Atmospheric Sciences, 3rd Edition, by Daniel Wilks.
with the Gaussian kernel set as
and t being
However, and here is my problem.
I then plot the two together...
plot(y=fx0,x=x0, type="l", ylim=c(0,0.07))
lines(x=auto.dens$x, y=auto.dens$y, col="red")
The output from the density function (red), and my calculations (black), I get
!These two calculations are clearly different!
Have I miss understood how the density function works? Why can't I manage to calcualte the same results from scratch? Why is my kernel estimator providing different results? Why are my results less smooth?
I need to construct and apply a kernel smoother (not just of density) to a much more complicated dataset, and only did this little example to make sure I was doing the same as the the automated functions, and really wasn't expecting the have this problem. I've tried all kinds of things, and just cannot see why I get a different result.
Thank you all in advance, for reading and any comments, little or big.
You don't need to sum(K*t)
, just sum(K)
.
xi <- mtcars$mpg
plot(density(xi, kernel = "gaussian"), lwd = 2)
auto.dens <- density(xi, kernel = "gaussian")
h <- auto.dens$bw # bandwidth for kernel
x0 <- auto.dens$x # points for prediction
fx0 <- NULL
for (j in 1:length(x0)) {
t <- abs(x0[j]-xi)/h
K <- (1/sqrt(2*pi))*exp(-(t^2)/2)
fx0 <- c(fx0, sum(K)/(length(t)*h))
}
lines(x0, fx0, col = "red", lty = "dotted")