Search code examples
rdataframemultiple-columnsconfidence-intervalcredible-interval

Calculate 95 % credible interval for multiple columns in data frame R


I need to get/calculate the 95 % credible interval for my data. My data consists of ten columns and over 5000 rows. Here is some example data.

data <- data.frame(A = c(-7.595932, -6.451768, -4.682111, -8.781488, -4.251690), 
                   B = c(0.8324450, 0.9451657, 0.8773759, 0.6044753, 0.6553995),
                   C = c(22.747480, 15.477470, 18.745407, 9.622865, 21.137619), 
                   D = c(-11.684762, -13.474299, -9.783277, -7.747501, -12.352081))

I am just not sure which function to use since I get different results each time and it only works with one column at a time. I have tried the following functions:

ci(data$`A`, confidence = 0.95)  ## R package gmodels

and

CI(data$`A`, confidence = 0.95) ##R package Rmisc

Have anyone else experienced the same problem?


Solution

  • If you want a credible interval (from Bayesian statistics) this requires some additional tuning, choice of prior and likelihood. There are some defaults already in some functions, so you may get away with it, but you should really know what you are doing, before blindly applying such concepts. Here is an example for demonstration purposes.

    library(bayestestR)
    
    data <- data.frame(A = c(-7.595932, -6.451768, -4.682111, -8.781488, -4.251690), 
                       B = c(0.8324450, 0.9451657, 0.8773759, 0.6044753, 0.6553995),
                       C = c(22.747480, 15.477470, 18.745407, 9.622865, 21.137619), 
                       D = c(-11.684762, -13.474299, -9.783277, -7.747501, -12.352081))
    
    sapply(data,ci,ci=0.95)
    
            A         B         C        D        
    CI      95        95        95       95       
    CI_low  -8.662932 0.6095677 10.20833 -13.36208
    CI_high -4.294732 0.9383867 22.58649 -7.951079