Search code examples
rfunctionoptimizationdistributioncurve

Finding an interval that covers 95% area under an asymmetrical curve


I want to move from the mode (the red vertical line) of the curve called posterior toward the tails and stop when 95% of the area of the posterior is covered. My desire is to find the shortest interval (in x-axis units) that can do so. The two limit values of such interval are desired?

Note: I have tried the first solution HERE. But that solution DOES NOT work with this current problem!

P.S. Note that my posterior curve is NOT symmetric. Thus the shortest possible 95% is the best choice.

Here are my functions:

     prior = function(x) dnorm(x)
likelihood = function(x) dt(1.46, 19, x*sqrt(20))
 posterior = function(x) prior(x)*likelihood(x)

mode = optimize(posterior, interval = c(-2, 2), maximum = TRUE, tol = 1e-12)[[1]]
curve(posterior, -2, 2, n = 1e4)
abline(v = mode, col = 2)

enter image description here


Solution

  • I believe the way to solve this problem is analogous to that found in coda::HPDinterval (which works on a density); starting from the peak of the curve, move a horizontal line down; for each level, invert the two halves of the curve to find the intersection points; measure the area between the intersection points.

    Setup:

    prior = function(x) dnorm(x)
    likelihood = function(x) dt(1.46, 19, x*sqrt(20))
    posterior = function(x) prior(x)*likelihood(x)
    
    mode = optimize(posterior, interval = c(-2, 2), maximum = TRUE, tol = 1e-12)[[1]]
    curve(posterior, -2, 2, n = 1e4)
    abline(v = mode, col = 2)
    

    Function inverse of the posterior distribution, one side at a time:

    inverse.posterior <- function(x,side="left") {
      target <- function(y) posterior(y)-x
      ur <- switch(side,
        left=uniroot(target,interval=c(-2,mode)),
        right=uniroot(target,interval=c(mode,2)))
      return(ur$root)
    }
    
    i1 <- inverse.posterior(0.07,"left")
    i2 <- inverse.posterior(0.07,"right")
    abline(h=0.07,col="gray")
    abline(v=c(i1,i2),col="gray")
    

    Compute the area corresponding to a given horizontal cutoff:

    areafun <- function(h) {
      i1 <- inverse.posterior(h,"left")
      i2 <- inverse.posterior(h,"right")
      return(integrate(posterior,i1,i2)$value)
    }
    
    areafun(0.07)
    

    Find the height that gives a specific fraction of the density:

    post.area <- integrate(posterior,-2,2)$value
    find.lims <- function(a) {
      ur <- uniroot(function(h) areafun(h)/post.area-a,
           c(0.01,posterior(mode)-0.01))
      return(ur$root)
    }
    

    Try it out:

    f <- find.lims(0.95)
    ## critical height = 0.02129
    lwr <- inverse.posterior(f,"left")  ## -0.124
    upr <- inverse.posterior(f,"right") ## 0.753
    integrate(posterior,lwr,upr)$value/post.area ## 0.9499
    

    For your second problem (Cauchy) I decided to encapsulate my solution into a function. tl;dr it works if you make the limits wide enough.

    get.HPDinterval <- function(posterior,lwr=-2,upr=2,level=0.95,eps=0.001) {
       mode = optimize(posterior, interval = c(lwr, upr), maximum = TRUE, tol = 1e-12)[[1]]
      inverse.posterior <- function(x,side="left") {
        target <- function(y) posterior(y)-x
        ur <- switch(side,
                     left=try(uniroot(target,interval=c(lwr,mode))),
                     right=try(uniroot(target,interval=c(mode,upr))))
        if (inherits(ur,"try-error")) stop("inverse.posterior failed: extend limits?")
        return(ur$root)
      }
      areafun <- function(h) {
        i1 <- inverse.posterior(h,"left")
        i2 <- inverse.posterior(h,"right")
        return(integrate(posterior,i1,i2)$value)
      }
      post.area <- integrate(posterior,lwr,upr)$value
      if (post.area<level) stop("limits don't encompass desired area: a=",round(post.area,3))
      find.lims <- function(a) {
         ur <- uniroot(function(h) areafun(h)/post.area-a,
                       c(eps,posterior(mode)-eps))
      return(ur$root)
      }
      f <- find.lims(level)
      return(c(inverse.posterior(f,"left"),
               inverse.posterior(f,"right")))
    }
    
    get.HPDinterval(posterior)
    
    posterior2 = function(x) dcauchy(x)
    get.HPDinterval(posterior2,-10,10)  ## limits don't encompass desired area
    get.HPDinterval(posterior2,-15,15)  ## inverse.posterior failed: extend limits?
    get.HPDinterval(posterior2,-20,20)  ## -7.83993 7.83993