Search code examples
rggplot2scalefacet-gridyaxis

How to get a complete vector of breaks from the scale of a plot in R?


I am trying to add captions outside the plots using the solutions from this post and this one.

I think I managed to get what I want, but I am trying to automatize the code if the data changes. Now my problem is that I need a way to get the vector of all the values/breaks from the y-axis from the plot. I don't want to change the y-axis and I don't want to get only the range (I found this post to get the ranges, but I don't want only that)

On the other hand, I found this post, but the solution doesn't work for new versions of ggplot2 (mine is 3.3.5).

This is my example:

library(ggplot2)
library(dplyr)

# DATA
val1 <- c(2.1490626,2.2035281,1.5927854,3.1399245,2.3967338,3.7915825,4.6691277,3.0727319,2.9230937,2.6239759,3.7664386,4.0160378,1.2500835,4.7648343,0.0000000,5.6740227,2.7510256,3.0709322,2.7998003,4.0809085,2.5178086,5.9713330,2.7779843,3.6724801,4.2648527,3.6841084,2.5597235,3.8477471,2.6587736,2.2742209,4.5862788,6.1989269,4.1167091,3.1769325,4.2404515,5.3627032,4.1576810,4.3387921,1.4024381,0.0000000,4.3999099,3.4381837,4.8269218,2.6308474,5.3481382,4.9549753,4.5389650,1.3002293,2.8648220,2.4015338,2.0962332,2.6774765,3.0581759,2.5786137,5.0539080,3.8545796,4.3429043,4.2233248,2.0434363,4.5980727)
val2 <- c(3.7691229,3.6478055,0.5435826,1.9665861,3.0802654,1.2248374,1.7311236,2.2492826,2.2365337,1.5726119,2.0147144,2.3550348,1.9527204,3.3689502,1.7847986,3.5901329,1.6833872,3.4240479,1.8372175,0.0000000,2.5701453,3.6551315,4.0327091,3.8781182)
val3 <- c(2.1490626,2.2035281,1.5927854,3.1399245,2.3967338,3.7915825,4.6691277,3.0727319,2.9230937,2.6239759,3.7664386,4.0160378,1.2500835,4.7648343,0.0000000,5.6740227,2.7510256,3.0709322,2.7998003,4.0809085,2.5178086,5.9713330,2.7779843,3.6724801,4.2648527,3.6841084,2.5597235,3.8477471,2.6587736,2.2742209,4.5862788,6.1989269,4.1167091,3.1769325,4.2404515,5.3627032,4.1576810,4.3387921,1.4024381,0.0000000,4.3999099,3.4381837,4.8269218,2.6308474,5.3481382,4.9549753,4.5389650,1.3002293,2.8648220,2.4015338,2.0962332,2.6774765,3.0581759,2.5786137,5.0539080,3.8545796,4.3429043,4.2233248,2.0434363,4.5980727)

df1 <- data.frame(value = val1)   
df2 <- data.frame(value = val2)   
df3 <- data.frame(value = val3)   

data <- bind_rows(lst(df1, df2, df3), .id = 'id')
data$Sex <- rep(c("Male", "Female"), times=72)
data$d <- "ff"
data <- as.data.frame(unclass(data), stringsAsFactors = TRUE)

# PLOT

p <- data %>% 
  ggplot(aes(value)) +
  geom_density(lwd = 1.2, colour="red", show.legend = FALSE) +
  geom_histogram(aes(y=..density.., fill = id), bins=10, col="black", alpha=0.2) +
  facet_grid(id ~ Sex ) +
  xlab("type_data") + 
  ylab("Density") +
  ggtitle("title") +
  guides(fill=guide_legend(title="legend_title")) +
  theme(strip.text.y = element_blank())

p

# ADD CAPTION

caption_df = data.frame(value = c(min(data$value), max(data$value)), id = c(rep(tail(levels(data$id), n=1), times=length(levels(data$Sex)))),
                        Sex = c(levels(data$Sex)))

p + coord_cartesian(clip = "off", 
                    ylim = layer_scales(p)$y$range$range, 
                    xlim = layer_scales(p)$x$range$range) +
  geom_text(data = caption_df,
            aes(y = -0.15, label = c(levels(data$Sex))))

Before adding the caption: enter image description here

After the caption: enter image description here

The idea is that I want to avoid having to set up the y parameter every time I change the data. Imagine that that the y-axis is different (it is something like this: 0.0000, 0.0005, 0.0010, 0.0015). In that case, the appropriate y would be -0.0005 because the "jump" is 0.0005, so I just have to make it negative.

For that reason, I was wondering if it is possible to get the COMPLETE vector of values from the y-axis. For example, if we want to get all the values/breaks of the y-axis from the previous images would be: c(0.0, 0.2, 0.4, 0.6).

Does anyone know if I can get ALL the values from the y-axis of a plot?

Thanks in advance


Solution

  • You can get the y axis breaks from the p object like this:

    as.numeric(na.omit(layer_scales(p)$y$break_positions()))
    #> [1] 0.0 0.2 0.4 0.6
    

    However, if you want the labels to be a fixed distance below the panel regardless of the y axis scale, it would be best to use a fixed fraction of the entire panel range rather than the breaks:

    yrange <- layer_scales(p)$y$range$range
    ypos <- min(yrange) - 0.2 * diff(yrange)
    
    p + coord_cartesian(clip = "off", 
                        ylim = layer_scales(p)$y$range$range, 
                        xlim = layer_scales(p)$x$range$range) +
      geom_text(data = caption_df,
                aes(y = ypos, label = c(levels(data$Sex))))
    

    enter image description here

    For example, suppose you had a y scale that was twice the size:

    p <- data %>% 
      ggplot(aes(value)) +
      geom_density(lwd = 1.2, colour="red", show.legend = FALSE) +
      geom_histogram(aes(y= 2 * ..density.., fill = id), bins=10, col="black", alpha=0.2) +
      facet_grid(id ~ Sex ) +
      xlab("type_data") + 
      ylab("Density") +
      ggtitle("title") +
      guides(fill=guide_legend(title="legend_title")) +
      theme(strip.text.y = element_blank())
    

    Then the exact same code would give you the exact same label placement, without any reference to breaks:

    yrange <- layer_scales(p)$y$range$range
    ypos <- min(yrange) - 0.2 * diff(yrange)
    
    p + coord_cartesian(clip = "off", 
                        ylim = layer_scales(p)$y$range$range, 
                        xlim = layer_scales(p)$x$range$range) +
      geom_text(data = caption_df,
                aes(y = ypos, label = c(levels(data$Sex))))
    

    enter image description here