Search code examples
rstatisticskernel-density

Getting values from kernel density estimation in R


I am trying to get density estimates for the log of stock prices in R. I know I can plot it using plot(density(x)). However, I actually want values for the function.

I'm trying to implement the kernel density estimation formula. Here's what I have so far:

a <- read.csv("boi_new.csv", header=FALSE)
S = a[,3] # takes column of increments in stock prices
dS=S[!is.na(S)] # omits first empty field

N = length(dS)                  # Sample size
rseed = 0                       # Random seed
x = rep(c(1:5),N/5)             # Inputted data

set.seed(rseed)   # Sets random seed for reproducibility

QL <- function(dS){
    h = density(dS)$bandwidth
    r = log(dS^2)
    f = 0*x
    for(i in 1:N){
        f[i] = 1/(N*h) * sum(dnorm((x-r[i])/h))
    }
    return(f)
}

QL(dS)

Any help would be much appreciated. Been at this for days!


Solution

  • You can pull the values directly from the density function:

    x = rnorm(100)
    d = density(x, from=-5, to = 5, n = 1000)
    d$x
    d$y
    

    Alternatively, if you really want to write your own kernel density function, here's some code to get you started:

    1. Set the points z and x range:

      z = c(-2, -1, 2)
      x = seq(-5, 5, 0.01)
      
    2. Now we'll add the points to a graph

      plot(0, 0, xlim=c(-5, 5), ylim=c(-0.02, 0.8), 
           pch=NA, ylab="", xlab="z")
      for(i in 1:length(z)) {
         points(z[i], 0, pch="X", col=2)
      }
       abline(h=0)
      
    3. Put Normal density's around each point:

      ## Now we combine the kernels,
      x_total = numeric(length(x))
      for(i in 1:length(x_total)) {
        for(j in 1:length(z)) {
          x_total[i] = x_total[i] + 
            dnorm(x[i], z[j], sd=1)
        }
      }
      

      and add the curves to the plot:

      lines(x, x_total, col=4, lty=2)
      
    4. Finally, calculate the complete estimate:

      ## Just as a histogram is the sum of the boxes, 
      ## the kernel density estimate is just the sum of the bumps. 
      ## All that's left to do, is ensure that the estimate has the
      ## correct area, i.e. in this case we divide by $n=3$:
      
      plot(x, x_total/3, 
             xlim=c(-5, 5), ylim=c(-0.02, 0.8), 
             ylab="", xlab="z", type="l")
      abline(h=0)
      

      This corresponds to

      density(z, adjust=1, bw=1)
      

    The plots above give:

    enter image description here