Search code examples
rggplot2grob

layer_scales not detecting all breaks from ggplot


Given the following plot:

library(tidyverse)
p <- ggplot(mtcars, aes(drat, disp)) +
  geom_line()
p

enter image description here layer_scales can be used (here) to extract breaks/break positions from most ggplot objects like the one above e.g.

# layer_scales(p)$y$get_breaks()
as.numeric(na.omit(layer_scales(p)$y$break_positions()))
# [1] 100 200 300 400
# returns exactly the breaks that are in the plot

But when I try to extract the ones from this plot, it doesn't work

df <- structure(list(date = structure(c(18080, 19281, 19096, 17178, 
                                        17692, 18659, 17129, 17114, 18833, 16472), class = "Date"), yy = c(1589L, 
                                                                                                           5382L, 4504L, 595L, 1027L, 2864L, 556L, 549L, 3346L, 42L)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                           -10L))
df
p1 <- ggplot(df, aes(x = date, y = yy)) +
  geom_point() 
p1

enter image description here

layer_scales(p1)$y$get_breaks()
# [1]    0 1000 2000 3000 4000 5000
as.numeric(na.omit(layer_scales(p1)$y$break_positions()))
# [1] 1000 2000 3000 4000 5000
# it doesn't return 0 2000 4000

Any idea why layer_scales is not working in this case?


Solution

  • The other answer given here is a perfectly reasonable work-around. As for why it happens, the answer is a bit complicated.


    Explanation

    The object returned from layer_scales(p1)$y is a ggproto object of class ScaleContinuousPosition which has been trained on the plotting data. However, it is not quite the final scale object that is used to generate the y axis in a ggplot. There is the extra step of turning it into a final, immutable scale object of class ViewScale. The main difference is that this has additionally been trained on the range and limits of the plot's co-ordinate system (including the co-ordinate expansion).

    What is happening in your second plot is that the expansion of the y axis to pretty limits above and below the range of your data is causing the y co-ordinate range to expand:

    range(df$yy)
    #> [1]   42 5382
    
    ggplot_build(p1)$layout$panel_params[[1]]$y.range
    #> [1] -225 5649
    

    This expanded range is being used as the basis for creating new breaks in the function ggplot2:::view_scales_from_scales, which creates a ViewScale object from the existing scale object using the function ggplot2:::view_scale_primary

    ggplot2:::view_scale_primary(layer_scales(p1)$y, c(-225, 5649))$get_breaks()
    #> [1]    0 2000 4000   NA
    

    The NA value is discarded, leaving you with the breaks you see on the plot.


    Solution

    The suggestion in the answer by @M-- of doing:

    as.numeric(na.omit(ggplot_build(p1)$layout$panel_params[[1]]$y$get_breaks()))
    

    works because it accesses the finalized ViewScale objects that are stored in the layout$panel_params member of the "ggplot_built" object created by ggplot_build(), rather than the trained but unfinished scale objects that are stored in the layout$panel_scales_x and layout$panel_scales_y of the "ggplot_built" object - these are what is returned by layer_scales().

    However, you might want a little wrapper function to replace layer_scales if you don't want to have to write this complex line of code each time:

    get_plot_breaks <- function(plot) {
      
      params <- ggplot_build(plot)$layout$panel_params[[1]]
      
      list(x = c(na.omit(params$x$get_breaks())), 
           y = c(na.omit(params$y$get_breaks())))
    }
    

    This allows the accurate capture of the breaks as drawn on the plot itself:

    get_plot_breaks(p1)
    #> $x
    #>  2016  2018  2020  2022 
    #> 16801 17532 18262 18993 
    #>
    #> $y
    #> [1]    0 2000 4000