Search code examples
rggplot2plotdata-visualizationbayesian

Add Credible Intervals to each line


I would appreciate your help with the following issue.

I am trying to add the 95% Credival Intervals (this is not the same as Confidence Intervals) to a density plot, which is separated by source.

The best way to calculate the 95% CI is with the function hdi(x, ci = 0.95) ('HDInterval' package).
I would like to make the plot in ggplot(), because it is easier to manipulate.

Here are some simulated data:

set.seed(14)
df <- data.frame(density = c(rgamma(400, 2, 10), rgamma(400, 2.25, 9),rgamma(400, 5, 7)),
                 source = rep(c("source_1", "source_2", "source_3"), 
                              each = 400))

For the ggplot

ggplot(df, aes(x = density, color = source)) +
  geom_density(size=1.5, alpha = 0.4) +
  labs(y = "Density", x = "Source contribution")

And the plot where the filled areas represent the 95% CI.

enter image description here

I also calculated the lower and upper 95% CI individually.


S1 <- df %>% filter(df$source == "source_1")
S2 <- df %>% filter(df$source == "source_2")
S3 <- df %>% filter(df$source == "source_3")

data_lower <- tribble(~source, ~value,  ~hdi,
                      'S1' , 0.025, hdi(S1$density)["lower"],
                      'S2' , 0.025, hdi(S2$density)["lower"],
                      'S3' , 0.025, hdi(S3$density)["lower"])

data_upper <- tribble(~source, ~value, ~hdi,
                      's1', 0.975, hdi(S1$density)["upper"],
                      's2', 0.975, hdi(S2$density)["upper"],
                      's3', 0.975, hdi(S3$density)["upper"])

But they can also be calculated by source.

hdi(S1$density, ci = 0.95)
hdi(S2$density, ci = 0.95)
hdi(S3$density, ci = 0.95)

I would appreciate your help. Thanks in advance.


Solution

  • Adapting this answer by @ClausWilke to your case you could achieve your result via ggridges:: geom_density_ridges_gradient. Basically I use after_stat and an ifelse to fill only the upper and lower quantiles computed by hdi.

    set.seed(14)
    
    df <- data.frame(
      density = c(rgamma(400, 2, 10), rgamma(400, 2.25, 9), rgamma(400, 5, 7)),
      source = rep(c("source_1", "source_2", "source_3"),
        each = 400
      )
    )
    
    library(ggplot2)
    library(HDInterval)
    library(ggridges)
    
    ggplot(df, aes(x = density, color = source, fill = after_stat(ifelse(quantile == 2, NA, color)))) +
      geom_density_ridges_gradient(aes(y = 0), quantile_lines = TRUE, quantile_fun = hdi, vline_linetype = 0) +
      labs(y = "Density", x = "Source contribution") +
      scale_fill_discrete(guide = "none", na.value = "transparent")
    #> Picking joint bandwidth of 0.0546