Search code examples
rprobabilitydistributionfitdistrplus

How to find interval prbability for a given distribution?


Suppose I have some data and I fit them to a gamma distribution, how to find the interval probability for Pr(1 < x <= 1.5), where x is an out-of-sample data point?

require(fitdistrplus)

a <- c(2.44121289,1.70292449,0.30550832,0.04332383,1.0553436,0.26912546,0.43590885,0.84514809,
0.36762336,0.94935435,1.30887437,1.08761895,0.66581035,0.83108270,1.7567334,1.00241339,
0.96263021,1.67488277,0.87400413,0.34639636,1.16804671,1.4182144,1.7378907,1.7462686,
1.7427784,0.8377457,0.1428738,0.71473956,0.8458882,0.2140742,0.9663167,0.7933085,
0.0475603,1.8657773,0.18307362,1.13519144)

fit <- fitdist(a, "gamma",lower = c(0, 0))

Solution

  • Someone does not like my above approach, which is conditional on MLE; now let's see something unconditional. If we take direct integration, we need a triple integration: one for shape, one for rate and finally one for x. This is not appealing. I will just produce Monte Carlo estimate instead.

    Under Central Limit Theorem, MLE are normally distributed. fitdistrplus::fitdist does not give standard error, but we can use MASS::fitdistr which would performs exact inference here.

    fit <- fitdistr(a, "gamma", lower = c(0,0))
    
    b <- fit$estimate
    #   shape     rate 
    #1.739737 1.816134 
    
    V <- fit$vcov  ## covariance
              shape      rate
    shape 0.1423679 0.1486193
    rate  0.1486193 0.2078086
    

    Now we would like to sample from parameter distribution and get samples of target probability.

    set.seed(0)
    ## sample from bivariate normal with mean `b` and covariance `V`
    ## Cholesky method is used here
    X <- matrix(rnorm(1000 * 2), 1000)  ## 1000 `N(0, 1)` normal samples
    R <- chol(V)  ## upper triangular Cholesky factor of `V`
    X <- X %*% R  ## transform X under desired covariance
    X <- X + b  ## shift to desired mean
    ## you can use `cov(X)` to check it is very close to `V`
    
    ## now samples for `Pr(1 < x < 1.5)`
    p <- pgamma(1.5, X[,1], X[,2]) - pgamma(1, X[,1], X[,2])
    

    We can make a histogram of p (and maybe do a density estimation if you want):

    hist(p, prob = TRUE)
    

    enter image description here

    Now, we often want sample mean for predictor:

    mean(p)
    # [1] 0.1906975