Search code examples
rggplot2r-forestplotforest-plots

How to assign different colors to points based on values in a forest plot in R?


I am trying to create a forest plot in R using the forestplot package. I want to assign different colors to the points and lines based on whether the log.est values are above or below 0. Specifically, I want:

  • Blue for log.est > 0
  • Red for log.est <= 0

Here is my data:

df <- structure(list(variable = c("N.Acetylputrescine", "Homocitrulline", "Argininic.acid", 
               "SM.C16.1", "Oxalic.acid", "Cer.d18.1.22.0.", "Citrulline", 
               "X3.Hydroxybutyric.acid", "Glycine", "Cer.d18.1.25.0.", 
               "Uridine", "Cer.d18.1.24.1.", "Deoxyguanosine", "Adenine"), 
  log.est = c(18.12, 11.70, 11.61, 9.95, 8.79, 8.72, 7.07, 4.13, 2.63, 
              -5.85, -6.47, -6.81, -10.47, -14.84), 
  p.value = c(9.49e-05, 0.000196, 0.0117, 0.137, 7.44e-05, 0.251, 0.514, 
              0.000162, 0.0376, 0.909, 0.000858, 0.345, 0.000531, 1.4e-05), 
  log.lower = c(17.12, 10.62, 9.44, -8.30, 7.81, -8.23, -8.08, 3.08, 
                -1.49, -10.03, -7.14, -8.43, -11.12, -15.37), 
  log.upper = c(18.71, 12.31, 12.44, 11.16, 9.38, 10.16, 9.08, 4.74, 
                3.59, 9.86, -5.19, 6.91, -9.27, -13.97)
), class = "data.frame", row.names = c(NA, -14L))

Here is the code I used to generate the forest plot:

library(forestplot)

forestplot(
  labeltext = cbind("Variable" = df$variable,
                    "Coefficient" = round(df$log.est, 2),
                    "P-value" = format.pval(df$p.value)),
  mean = df$log.est,
  lower = df$log.lower,
  upper = df$log.upper,
  xlab = "Effect Size",
  title = "Logistic Regression"
)

This generates the forest plot, but all points and lines are the same color. How can I assign blue to points with log.est > 0 and red to points with log.est <= 0?

I tried below code: But it gave me blue color for all the data.

# Define colors based on log.est values
df$color <- ifelse(df$log.est > 0, "blue", "red")

# Create the forest plot
forestplot(labeltext = cbind("Variable" = df$variable,
                    "Coefficient" = round(df$log.est, 2),
                    "P-value" = format.pval(df$p.value)),
           mean = df$log.est,
           lower = df$log.lower,
           upper = df$log.upper,
           xlab = "Effect Size",
           title = "Logistic Regression",
           boxsize = 0.3, # Size of the points
           txt_gp = fpTxtGp(label = gpar(cex = 0.8)), # Text size customization
           col = fpColors(box = df$color, line = df$color, zero = "black"))

ouput

Any help or suggestions would be greatly appreciated. Thank you!


Solution

  • You can try:

    # Order df by log.est
    df <- df[order(df$log.est, decreasing = TRUE), ]
    
    # Add a color column for positive and negative values
    df$color <- ifelse(df$log.est >= 0, "blue", "red")
    
    # Forest plot
    ggplot(df, aes(x = log.est, y = reorder(variable, log.est), xmin = log.lower, xmax = log.upper, color = color)) +
      geom_pointrange() +
      geom_vline(xintercept = 0, linetype = "dashed", color = "black") +
      scale_color_identity() +
      labs(title = "Forest Plot with Color-Coded Estimates",
           x = "Log Estimate",
           y = "Variable") +
      theme_minimal() +
      theme(axis.text.y = element_text(size = 10), axis.title.y = element_blank())
    

    enter image description here