Search code examples
rggplot2legendr-forestplotggstatsplot

How do I add a legend indicating significance levels below a ggplot object?


I'm using ggforestplot() to plot the results from my several regression models where some of the data have been imputed with mice(). But for the sake of this MWE, I will use the example data of the ggforestplot() package instead:

# Load packages
library(ggforestplot)
library(tidyverse)

# Use the example data of the ggforesplot() package (only a few rows of it)
df <- ggforestplot::df_linear_associations %>%
  filter(trait == "BMI") %>% slice(26:29)

In real world, I have 1) first run summary(pool(modelFit)) separately for all the outcomes of my (partially imputed) data and 2) then gathered the results from those several models to just one data frame. The structure of my real data frame is basically identical with what the above example code produces. The example data frame:

# Display the example data
df
#> # A tibble: 4 × 5
#>   name           trait    beta      se   pvalue
#>   <chr>          <chr>   <dbl>   <dbl>    <dbl>
#> 1 Omega-3 %      BMI   -0.0187 0.00906 3.90e- 2
#> 2 DHA %          BMI    0.0232 0.00913 1.11e- 2
#> 3 Unsaturation   BMI   -0.120  0.00906 3.21e-40
#> 4 Sphingomyelins BMI    0.0608 0.00938 9.10e-11

Using ggforestplot() it is extremely easy to make a forest plot of the above results:

# Create plot
ggforestplot::forestplot(
    df = df,
    name = name,
    estimate = beta,
    se = se,
    pvalue = pvalue,
    psignif = 0.002
  )

Question: how could I add a legend below the plot that explains the significance levels as indicated by the dots in the plot? Compare with the result of another package ggstats and its function ggcoef_model():

# Load packages
library(ggstats)
# Load example data
data(tips, package = "reshape")
# Run linear model
linear_model <- lm(tip ~ size + total_bill, data = tips)
# Plot model
ggcoef_model(linear_model)

Created on 2023-09-10 with reprex v2.0.2


Solution

  • As with many ggplot extension packages, what you gain in ease of use of the wrapper function forestplot you lose in the ability to customize your plot, including, as far as I can tell, the ability to easily add a legend that represents the filled aesthetic.

    Fortunately, the package also contains the handy low-level functions geom_stripes and theme_forest to make it easy to produce a ggplot which is almost identical to the original but includes a legend for the p values, and gives you full control of the shapes, line thickness, colors, theme elements etc:

    library(ggplot2)
    
    ggplot(df, aes(x = beta, y = name)) +
      ggforestplot::geom_stripes() +
      geom_vline(xintercept = 0) +
      geom_linerange(aes(xmin = beta - qnorm(0.99) * se, 
                         xmax = beta + qnorm(0.99) * se)) +
      geom_point(aes(shape = pvalue < 0.02), fill = 'white', size = 3) +
      scale_shape_manual(NULL, values = c(21, 16), 
                         labels = c('Not significant', 'Significant')) +
      ggforestplot::theme_forest() +
      theme(legend.position = 'bottom')
    

    enter image description here