Search code examples
rggplot2labelx-axis

change number of breaks and x-axis label based off another column


I have a dataframe with too many variables on the x-axis so I would like to introduce breaks in my x-axis labels and change those labels based on another column. I've found a solution here is-it-possible-to-have-a-continuous-line-with-geom-line-across-facets-with-facete which works when I set breaks =1 but when I try to add multiple breaks I get an error:

Below is modified from the linked example.

library(patchwork)
library(ggplot2)
library(scales)

df_graph_data = data.frame(   year = c(
 rep.int("2020", times = 11), 
 rep.int("2021", times = 12), 
 rep.int("2022", times = 3)   ),   month_name = c(
 "Feburary", "March", "April", "May", "June", "July",
 "August", "September", "October", "November", "December",
 "January", "Feburary", "March", "April", "May", "June", "July",
 "August", "September", "October", "November", "December",
 "January", "Feburary", "March"   ),   month_number = c(
 "02", "03", "04", "05", "06", "07",
 "08", "09", "10", "11", "12", "01",
 "02", "03", "04", "05", "06", "07",
 "08", "09", "10", "11", "12", "01",
 "02", "03"   ),   number_of_queries = c(
 484819, 576697, 843015, 925175,
 1102853, 889212, 835706, 774622,
 701338, 850297, 1046064, 1273363,
 958868, 1088284, 1151606, 1666950,
 2025731, 2731704, 2429019, 3228395,
 3204915, 2612807, 2811946, 3053788,
 2589273, 2305433   ) )

df_graph_data$rownum = 1:nrow(df_graph_data)

windows()
graph <- ggplot(df_graph_data) +   geom_line(aes(x = rownum,
y = number_of_queries),   size = 1,   colour = "blue",   linetype =
"solid"   ) +    scale_x_continuous(
 breaks = seq(
   min(df_graph_data$rownum),
   max(df_graph_data$rownum),
   by = 1
 ),
 labels = df_graph_data$month_number   )

graph

This produces this graph

enter image description here

The data set I have is much larger to I would need breaks = 10, but when I try this I get the following error: breaks and labels must have the same length.

I would like to find out if there is a way to introduce breaks based on one column and then change the label based on a corresponding column. So for example if the breaks show rownum 10, 20, 30 then the label should be the month_name that corresponds to that rownum


Solution

  • The idea of breaks and labels is rather straight forward: place label[i] at position breaks[i].

    If you want to space your labels further apart, you can use for instance this snippet:

    brk_idx <- seq(
       min(df_graph_data$rownum),
       max(df_graph_data$rownum),
       by = 10
    )
    
    ggplot(df_graph_data) +   
       geom_line(aes(x = rownum,
                     y = number_of_queries), 
                 linewidth = 1, colour = "blue",   
                 linetype = "solid") +    
       scale_x_continuous(
          breaks = df_graph_data$rownum[brk_idx],
          labels = df_graph_data$month_number[brk_idx])
    

    Line Plot with Breaks at self defined posiitons

    What it basically does is to look up the rows given by brk_idx and take rownum as position and month_number as label at this position:

    df_graph_data[brk_idx, c("rownum", "month_number")]
    #    rownum month_number
    # 1       1           02
    # 11     11           12
    # 21     21           10
    

    That is place "02" at position 1, "12" at position 11 and "10" at position 21. (N.B. brk_idx and df_graph_data$rownum[brk_idx] are the very same here)

    This explains your error by the way, when you canged the by argeument in seq to 10. You wanted to place all month_numbers at positions 1, 11 and 21 so you had 25 labels but only 3 positions.