Search code examples
rggplot2graphforest-plotsmeta-analysis

Forestplot type graph but not meta-analysis, with multiple columns


I'd like to create a forestplot type graph, but it's not for a meta-analysis. I have been looking at Bluetongue vaccine statistics, and want to be able to display the results so they can be easily compared.

I've tried the forestplot package (https://cran.r-project.org/web/packages/forestplot/vignettes/forestplot.html) but I keep getting all sorts of errors, presumably because I don't want all the stats it calculates.

I know I could do this in ggplot quite easily (https://www.statology.org/forest-plot-in-r/) but I would like to have multiple columns on the left side of each bar for different variable, and I'm not sure how to do this in ggplot, because they're essentially y-axis labels, not table columns.

Some simplified example code:

data <- data.frame(Study = c("study 1", "study 2", "study 3"),
                   Vaccine_serotype = c(2,5,6),
                   Viral_challenge_serotype = c(2,5,8),
                   Booster = c('Yes', 'No', 'No'),
                   Sample_size = c(4,10,6),
                   Percentage_inhibition = c(100, 98, 70),
                   Mean_days_seropositivity = c(14, 12, 16),
                   Min_days_seropositivity = c(7, 7, 10),
                   Max_days_seropositivity = c(16, 15, 19))

So here from this data I would love to create two plots...

The first one would be a forest plot with the columns (with nicely formatted titles):

  • Study
  • Vaccine serotype
  • Viral challenge serotype
  • Booster?

Then the forest plot next to it with the mean days to seropositivity being the point, and the bars extending to the minimum and maximum days to seropositivity.

And then the "zero" line would be extended from the mean of all the studies (i.e. 14 for this example).

And a big bonus would be if the points could vary in size depending on the sample size of the experiment. If not, the sample size would need to be another column.

And then I'd like a second plot which is similar, but with percentage inhibition being the points, and no extensions of min/max (because there isn't that data), and the "zero" line being the extension of the mean percentage inhibition (i.e. 89.3 for this example). And again the points varying in size dependent on the sample size.

Thanks


Solution

  • I don't know of an easy way to do this, but you can just place everything in a ggplot in a pedestrian way to get what you need:

    library(ggplot2)
    library(ggforestplot)
    
    ggplot(data, aes(Mean_days_seropositivity, Study)) +
      geom_stripes() +
      geom_errorbarh(aes(xmin = Min_days_seropositivity,
                         xmax = Max_days_seropositivity), height = 0.1) +
      geom_vline(xintercept = mean(data$Mean_days_seropositivity), linetype = 2) +
      geom_point(aes(size = Sample_size)) +
      annotate('rect', xmin = -Inf, ymin = -Inf, ymax = Inf, xmax = 5,
               fill = 'white', color = NA) +
      geom_text(aes(x = -8, label = Study)) +
      geom_text(aes(x = -5, label = Vaccine_serotype)) +
      geom_text(aes(x = -2, label = Viral_challenge_serotype)) +
      geom_text(aes(x = 1, label = Booster)) +
      annotate('text', x = c(-8, -5, -2, 1), y = c(3.5, 3.5, 3.5, 3.5),
               label = c('Study', 'Vaccine\nserotype', 'Viral\nchallenge\nserotype',
                         'Booster?'), fontface = 'bold', vjust = 0) +
      theme_forest() +
      scale_x_continuous('Mean days seropositivity',
                         breaks = c(5, 10, 15, 20)) +
      scale_size(range = c(2, 5), breaks = c(4, 6, 8, 10)) +
      coord_cartesian(clip = 'off') +
      theme(axis.text.y = element_blank(),
            axis.title.y = element_blank(),
            plot.margin = margin(50, 20, 20, 20),
            legend.position = 'bottom',
            axis.title.x = element_text(hjust = 0.85))
    

    enter image description here