Search code examples
rkernel-densitydensity-plot

Weibull Kernel Density Estimation in R


I am trying to evaluate the work of Salha et al. (2014), entitled as "Hazard Rate Function Estimation Using Weibull Kernel". But my density graph (for real data) is just a flat line rather then a proper density plot similar to article. Here is expected density,my R code and Weibull kernel. kindly help me to find out my mistake.

Expected Density Weibull Kernel

R code:

k<-200
yy<-c(1,1,1,5,7,8,8,13,14,14,17,18,21,21,22,25,27,27,30,30,31,31,32,34,35,36,37,38,39,39,40,49,49,54,56,56,62,63,65,65,67,75,76,79,82,83,84,84,84,90,91,92,93,93,103,103,111,112,119,122,123,126,129,134,144,147,153,163,167,175,228,231,235,242,256,256,257,311,314,322,369,415,573,609,640,737)
y<-log(yy)
n<-length(yy)

 h<-0.79 * IQR(y) * length(y) ^ (-1/5)
 x <- seq(min(yy) + 0.05, max(yy), length = k)

 KWeibull <- matrix(rep(0, k * n), ncol = k)
 fhat <- rep(0, k)
###########weibull###########
for (j in 1:k) {
    for (i in 1:n) {
        fn <- gamma(1 + h)
        KWeibull[i, j] <- (fn/(h * x[i])) * ((yy[i] * fn)/x[i])^((1/h) - 1) * exp(-((yy[i] * 
            fn)/x[i])^(1/h))
    }
    fhat[j] <- 1/n * (sum(KWeibull[, j]))
}

plot(x,fhat, type = "l")

Solution

  • Hope this helps:

    There are two issues needing to be addressed to attain the above plot:

    1. Use Logarithm

    Firstly (in the paper which you are referring to) they use the logarithm of the input data - I found this in section 5.2 of the paper --- below is this fix:

    k<-200
    yy<-c(1,1,1,5,7,8,8,13,14,14,17,18,21,21,22,25,27,27,30,30,31,31,32,34,35,36,37,38,39,39,40,49,49,54,56,56,62,63,65,65,67,75,76,79,82,83,84,84,84,90,91,92,93,93,103,103,111,112,119,122,123,126,129,134,144,147,153,163,167,175,228,231,235,242,256,256,257,311,314,322,369,415,573,609,640,737)
    y<-log(yy)
    n<-length(yy)
    
    #h<-0.79 * IQR(y) * length(y) ^ (-1/5)
    x <- seq(min(y) + 0.05, max(y), length = k)
    h <- 0.480411
    
    KWeibull <- matrix(rep(0, k * n), ncol = k)
    fhat <- rep(0, k)
    

    Note the bandwidth (h) is hard coded to be equivalent to the research papers bandwidth, however this is not a crucial fix.

    1. For loop Indexing

    The for loop - you are iterating over your yy (which I think is your time variable in the kernel density estimator) but your x random sample your are iterating over the same set each time. And also use y rather than yy as this is the logarithmic transformed data.

    See below fix: (This includes the Logarithm fix)

    ###########weibull###########
    for (j in 1:k) {
      for (i in 1:n) {
        fn <- gamma(1 + h)
        KWeibull[i, j] <- (fn/(h * x[j])) * ((y[i] * fn)/x[j])^((1/h) - 1) * exp(-((y[i] * 
                                                                                       fn)/x[j])^(1/h))
      }
      fhat[j] <- 1/n * (sum(KWeibull[, j]))
    }
    
    plot(yy,KWeibull[,86], type = "l")
    plot(x,fhat, type = "l")
    

    enter image description here