Search code examples
rggplot2plotbar-charterrorbar

R - ggplot2 - Add arrow if geom_errorbar outside limits when x-axis is a factor variable


I want to use geom_segment to replace error bars with arrows when the error exceeds a certain limit. I found a previous post that addresses this question: R - ggplot2 - Add arrow if geom_errorbar outside limits

The code works well, except that my x-axis is a factor variable instead of a numeric variable. Using position_dodge within the geom_segment statement makes the arrows start in the correct location, but it doesn't change the terminal point (xend) and all arrows point towards one central point on the x-axis instead of going straight up from the origins.

Instead of recoding the x-axis to be numeric (I will use this code to create many plots that have a range of x-axis values, with the last numeric value always ending in "+"), is there a way to correct this within geom_segment?

Code used:

data$OR.95U_u = ifelse(data$OR.95U > 10, 10 , NA)

ggplot(data, aes(x = numAlleles, y = OR, fill = Outcome)) + 
  geom_bar(position = position_dodge(.5), stat = "identity", width = .4, color = "black") + geom_hline(yintercept = 1, linetype = "dashed", color = "black") + 
  ylim(0,10) + geom_errorbar(aes(ymin=OR.95L, ymax=OR.95U), width=.2,position=position_dodge(.5)) +
  theme(legend.key = element_blank(), text = element_text(size = 11.5), legend.title = element_blank()) + 
  labs(x = "Number of rare alleles") +
  scale_fill_manual(values=c("chocolate1","coral1", "red2", "darkred")) + 
  geom_segment(aes(x = numAlleles, xend = numAlleles, y = OR, yend = OR.95U_u), position = position_dodge(.5), arrow = arrow(length = unit(0.3, "cm")))

Resulting figure

Resulting Figure


Solution

  • Ok, after investigating a bit, I didn't find a clean way of doing this, at it seems that position_dodge only change the x aes, and not the xend aes. position_nudge also don't work here, as it moves all the arrows at the same time. So I came with a dirty way of doing this. All we need is create a new variable with the desired xend position for the geom_segment. I try and came with a semi-automtized way of doing it, for any number of levels of the coloring variable, and also created a reproducible dataset to work with, as I'm sure this could be improved a lot by people with more knowledge than me. The code has inline comments expalining the steps:

    library(tidyverse)
    
    # dummy data (tried to replicate your plot data more or less accurately)
    df <- tibble(
      numAlleles = rep(c("1", "2+"), each = 4),
      Outcome = rep(LETTERS[1:4], 2),
      OR = c(1.4, 1.5, 1.45, 2.3, 3.8, 4.2, 4.0, 1.55),
      OR.95U = c(1.9,2.1,1.9,3.8,12,12,12,12),
      OR.95L = c(0.9, 0.9, 0.9, 0.8, NA, NA,NA,NA)
    ) %>%
      mutate(
        OR.95U_u = if_else(OR.95U > 10, 10, NA_real_)
      )
    
    # as it seems that position_dodge in a geom_segment only "dodge" the x aes and
    # not the xend aes, we need to supply a custom xend. Also, we need to try
    # to automatize the position, for more classes or different dodge widths.
    # To do that, lets start with some parameters:
    # position_dodge width
    position_dodge_width <- 0.5
    # number of bars per x axis class
    bars_per_class <- length(unique(df$Outcome))
    # total space available per class. In discrete vars, this is 1 au (arbitrary unit)
    # for each class, but position_dodge only use the fraction of that unit
    # indicated in the width parameter, so we need to calculate the real
    # space available:
    total_space_available <- 1 * position_dodge_width
    # now we calculate the real bar width used by ggplot in these au, dividing the
    # space available by the number of bars to plot for each class
    bar_width_real <- (total_space_available / bars_per_class)
    # position_dodge with discrete variables place bars to the left and to the right of the
    # class au value, so we need to know when to place the xend to the left or
    # to the right. Also, the number of bars has to be taken in to account, as
    # in odd number of bars, one is located on the exact au value
    if (bars_per_class%%2 == 0) {
      # we need an offset, as bars are wider than arrows, and we want them in the
      # middle of the bar
      offset_segment <- bar_width_real / 2
      # offset modifier to know when to substract or add the modifier
      offset_modifier <- c(rep(-1, bars_per_class%/%2), rep(1, bars_per_class%/%2))
      # we also need to know how meny bars to the left and how many to the right,
      # but, the first bar of each side is already taken in account with the offset,
      # so the bar modifier has to have one bar less for each side
      bar_width_modifier <- c(seq((bars_per_class%/%2-1), 0), seq(0, (bars_per_class%/%2-1)))
    } else {
      # when odd number of columns, the offset is the same as the bar width
      offset_segment <- bar_width_real
      # and the modifiers have to have a middle zero value for the middle bar
      offset_modifier <- c(rep(-1, bars_per_class%/%2), 0, rep(1, bars_per_class%/%2))
      bar_width_modifier <- c(seq((bars_per_class%/%2-1), 0), 0, seq(0, (bars_per_class%/%2-1)))
    }
    
    # finally we create the vector of xend values needed:
    df %>%
      mutate(
        numAlleles_u = as.numeric(as.factor(numAlleles)) + offset_modifier*(offset_segment + (bar_width_modifier*bar_width_real))
      )
    
    ggplot(df, aes(x = numAlleles, y = OR, fill = Outcome)) + 
      geom_bar(
        position = position_dodge(position_dodge_width), stat = "identity",
        width = 0.4, color = "black"
      ) +
      geom_hline(yintercept = 1, linetype = "dashed", color = "black") +
      ylim(0,10) +
      geom_errorbar(
        aes(ymin=OR.95L, ymax=OR.95U), width=.2,position=position_dodge(position_dodge_width)
      ) +
      theme(
        legend.key = element_blank(), text = element_text(size = 11.5),
        legend.title = element_blank()
      ) + 
      labs(x = "Number of rare alleles") +
      scale_fill_manual(values=c("chocolate1","coral1", "red2", "darkred")) + 
      geom_segment(
        aes(x = numAlleles, xend = numAlleles_u, y = OR, yend = OR.95U_u),
        position = position_dodge(position_dodge_width), arrow = arrow(length = unit(0.3, "cm"))
      )
    

    And the plot: four_bars

    We can check that for three levels discrete variables also works:

    df_three_bars <- df %>% filter(Outcome != 'D')
    bars_per_class <- length(unique(df_three_bars$Outcome))
    total_space_available <- 1 * position_dodge_width
    bar_width_real <- (total_space_available / bars_per_class)
    if (bars_per_class%%2 == 0) {
      offset_segment <- bar_width_real / 2
      offset_modifier <- c(rep(-1, bars_per_class%/%2), rep(1, bars_per_class%/%2))
      bar_width_modifier <- c(seq((bars_per_class%/%2-1), 0), seq(0, (bars_per_class%/%2-1)))
    } else {
      offset_segment <- bar_width_real
      offset_modifier <- c(rep(-1, bars_per_class%/%2), 0, rep(1, bars_per_class%/%2))
      bar_width_modifier <- c(seq((bars_per_class%/%2-1), 0), 0, seq(0, (bars_per_class%/%2-1)))
    }
    df_three_bars <- df_three_bars %>%
      mutate(
        numAlleles_u = as.numeric(as.factor(numAlleles)) + offset_modifier*(offset_segment + (bar_width_modifier*bar_width_real))
      )
    
    ggplot(df_three_bars, aes(x = numAlleles, y = OR, fill = Outcome)) + 
      geom_bar(
        position = position_dodge(position_dodge_width), stat = "identity",
        width = 0.4, color = "black"
      ) +
      geom_hline(yintercept = 1, linetype = "dashed", color = "black") +
      ylim(0,10) +
      geom_errorbar(
        aes(ymin=OR.95L, ymax=OR.95U), width=.2,position=position_dodge(position_dodge_width)
      ) +
      theme(
        legend.key = element_blank(), text = element_text(size = 11.5),
        legend.title = element_blank()
      ) + 
      labs(x = "Number of rare alleles") +
      scale_fill_manual(values=c("chocolate1","coral1", "red2", "darkred")) + 
      geom_segment(
        aes(x = numAlleles, xend = numAlleles_u, y = OR, yend = OR.95U_u),
        position = position_dodge(position_dodge_width), arrow = arrow(length = unit(0.3, "cm"))
      )
    

    enter image description here