Search code examples
rggplot2ggrepelabline

Programmatically label multiple ablines in R ggplot2


There are existing questions asking about labeling a single geom_abline() in ggplot2:

None of these get at a use-case where I wanted to add multiple reference lines to a scatter plot, with the intent of allowing easy categorization of points within slope ranges. Here is a reproducible example of the plot:

library(ggplot2)

set.seed(123)
df <- data.frame(
  x = runif(100, 0, 1),
  y = runif(100, 0, 1))

lines <- data.frame(
  intercept = rep(0, 5),
  slope = c(0.1, 0.25, 0.5, 1, 2))

p <- ggplot(df, aes(x = x, y = y)) +
  geom_point() +
  geom_abline(aes(intercept = intercept, slope = slope),
              linetype = "dashed", data = lines)
p

scatter plot with dashed ablines added at various slopes

As I found no way to do this programmatically via the other questions, I "scaled" the manual approach via a data frame, using trial and error to figure out reasonable label positions.

labels <- data.frame(
  x = c(rep(1, 3), 0.95, 0.47),
  y = c(0.12, 0.28, 0.53, 1, 1),
  label = lines$slope)

p + geom_text(aes(label = label), color = "red", data = labels)

plot with ablines labeled with their slope value in red text

Is there a better way than trial and error? While this wasn't too bad with 5 lines, I still had to redo my tweaking further upon export, as the plot aspect ratios and spacing were not the same between prototyping in an R session vs. the generated image. Programmatic labeling would be a huge help.

For some thoughts:

  • I wondered if the parameter could be along a range of c(0, 1), to correspond to the position along the line
  • could the min/max x/y positions be extracted from the ggplot2 object internals (which I'm not familiar with) as a "cheat" for figuring out the position? Essentially if I know the pixel location of (0, intercept), I already know the slope, so for this example, I just need to know the pixel position of max(x) or max(y), depending on where we hit the perimeter
  • this struck me as similar to ggrepel, which figures out how to label points while trying to avoid overlaps

Solution

  • This was a good opportunity to check out the new geomtextpath, which looks really cool. It's got a bunch of geoms to place text along different types of paths, so you can project your labels onto the lines.

    However, I couldn't figure out a good way to set the hjust parameter the way you wanted: the text is aligned based on the range of the plot rather than the path the text sits along. In this case, the default hjust = 0.5 means the labels are at x = 0.5 (because the x-range is 0 to 1; different range would have a different position). You can make some adjustments but I pretty quickly had labels leaving the range of the plot. If being in or around the middle is okay, then this is an option that looks pretty nice.

    library(ggplot2)
    library(geomtextpath)
    library(dplyr)
    
    # identical setup from the question
    
    p +
      geom_textabline(aes(intercept = intercept, slope = slope, label = as.character(slope)),
                      data = lines, gap = FALSE, offset = unit(0.2, "lines"), text_only = TRUE)
    

    Alternatively, since you've already got the equations of your lines, you can do some algebra to find your coordinates. Solve for x where y is at its max, and solve for y where x is at its max; for each of those, use pmin to limit them to fit within the scope of the chart. e.g. the line with slope = 0.5 won't hit y = 1 until x = 2, which is outside the chart, so limit it to the plot's max x. How you define that max can differ: could be the maximum contained in the data, which you could also extract from the saved plot object (not sure if there are cases where these wouldn't be the same), or it could be extracted from the panel layout or breaks. Or even more ideas at How can I extract plot axes' ranges for a ggplot2 object?. That's up to you.

    # y = intercept + slope * x
    xmax <- max(df$x) 
    # or layer_scales(p)$x$get_limits()[2] for data range
    # or ggplot_build(p)$layout$panel_params[[1]]$y.range[2] for panel range
    ymax <- max(df$y)
    lines_calc <- lines %>%
      mutate(xcalc = pmin((ymax - intercept) / slope, xmax),
             ycalc = pmin(intercept + slope * xmax, ymax))
    
    p +
      geom_text(aes(x = xcalc, y = ycalc, label = as.character(slope)),
                data = lines_calc, vjust = 0, nudge_y = 0.02)