Search code examples
rggplot2chartstidyversevisualization

How to Find the Scale for Second Axis in ggplot graph


I need to build a graph of bar chart and line graph with independent Y-axis (the secondary axis is not related to the primary axis).

enter image description here

The example dataset:

QY = c("3Q22", "4Q22", "1Q23",
       "2Q23", "3Q23")
shipment = c(50400.10,
             60268.39,
             46853.50,
             45867.33,
             50555.89)
growth =  c(0.15759993,
            0.19579914,
            -0.22258583,
            -0.02104799,
            0.10222003)

df <- data.frame(as.factor(QY), shipment, growth)

A function that I preused to standardized all graphs to our in-house style:

mystyle <- function(){ 
  font <- "Calibri"   #assign font family up front
  
  theme_bw() %+replace%    #replace elements we want to change
    
    theme(
      
      #grid elements
      panel.grid.major.x = element_blank(),    #strip major gridlines
      panel.grid.minor = element_blank(),      #strip minor gridlines
      panel.border = element_blank(),          #strip box
      # axis.ticks.y = element_blank(),          #strip y axis ticks
      axis.line = element_line(color = 'black'), #add axis line
      #text elements
      plot.title = element_text(             #title
        family = font,            #set font family
        size = 12,                #set font size
        face = 'bold',            #bold typeface
        hjust = 0,                #left align
        vjust = 2),               #raise slightly
      
      plot.subtitle = element_text(          #subtitle
        family = font,            #font family
        size = 10),               #font size
      
      plot.caption = element_text(           #footer
        family = font,            #font family
        size = 9,                 #font size
        hjust = c(1,0)),               #right align
      
      axis.title = element_text(             #axis titles
        family = font,            #font family
        face = "bold",
        size = 10),               #font size
      
      axis.text = element_text(              #axis text
        family = font,            #axis family
        size = 9),                #font size
      
      axis.text.x = element_text(            #margin for axis text
        margin=margin(5, b = 10)),
      
      axis.title.x = element_blank(),        #remove x axis title
      
      legend.text = element_text(
        family = font,
        size = 9),
      
      legend.title = element_blank()
      
      #since the legend often requires manual tweaking 
      #based on plot content, don't define it here
    )
}

Following the historical question on ggplot with 2 y axes, I understand that ggplot actually did not accept dual independent y axes. But I must rebuild this graph exactly as it is. This is a matter of automating existing graph from excel to R.

So I follow the most relevant answer to mine, on Climatograph dataset that calculate the scale.

ylim.prim <- c(0, 70)   # for shipment
ylim.sec <- c(-0.25, 0.25)    # for growth
#following formula in Stack Overflow eventhough I don't understand it

b <- diff(ylim.prim)/diff(ylim.sec)
a <- ylim.prim[1] - b*ylim.sec[1]

ggplot(df, aes(QY, shipment)) +
  geom_col(fill = "purple") +
  geom_line(aes(y = a + growth*b), color = "green") +
  scale_y_continuous("Thousands of units", 
                     labels = function(x)x/1000,  #show thousands only
                     expand = expansion(add = 0),  #remove gap
                     sec.axis = sec_axis(~ (. - a)/b, name = "Growth",
                     labels = scales::percent_format(scale = 0.01))) +
  ggtitle("Total shipments") +
  mystyle()

The result: enter image description here

The line graph is not showing, the factor level is wrong and I have no idea what have I done to the secondary axis.

Any help in making this graph closer to the images is appreciated. In particular, how to calculate the scale for secondary axis.


Solution

  • I wonder if you are over-complicating things a bit.

    Firstly, ensure that your year - quarters are factored in the correct order:

    df <- data.frame(
            QY = factor(1:5, labels = c("3Q22", "4Q22", "1Q23", "2Q23", "3Q23")),
            shipment = c(50400.1, 60268.39, 46853.5, 45867.33, 50555.89), 
            growth = c(0.15759993, 0.19579914, -0.22258583, -0.02104799, 0.10222003)
          )
    

    When you plot, you can make sure the labels are correct on your primary axis by dividing by 1000 to give you the "thousands" shown on your Excel plot.

    To get the second axis right, you just need to multiply the values in the growth column so that the bottom number is more than zero, and the top number is close to 60.

    For example, you can multiply by 120 and add 40:

    ggplot(df, aes(QY, shipment/1000)) +
      geom_col(fill = "purple") +
      geom_line(aes(y = growth * 120 + 40, group = 1), color = "green") +
      scale_y_continuous('Shipments (thousands of units)',
                         sec.axis = sec_axis(~(.x - 40)/120, 
                                             labels = scales::percent)) +
      ggtitle("Total shipments") +
      theme_bw()
    

    enter image description here

    If you want to fully automate the process, you can use the following two functions. The first one takes the primary and secondary variables, and returns an appropriately transformed version of the secondary variable. The second function returns a formula that reverses this process for the secondary axis:

    make_trans <- function(var1, var2) {
      mult <- max(var1)/diff(range(var2))
      add <- -min(var2)*mult
      var2 * mult + add
    }
    
    reverse_trans <- function(var1, var2) {
      mult <- max(var1)/diff(range(var2))
      add <- -min(var2)*mult
      as.formula(paste0('~(.x - ', add, ')/', mult))
    }
    

    You could use them in your case like this:

    ggplot(df, aes(QY, shipment)) +
      geom_col(fill = "purple") +
      geom_line(aes(y = make_trans(shipment, growth), group = 1), color = "green") +
      scale_y_continuous('Shipments',
                         sec.axis = sec_axis(reverse_trans(df$shipment, df$growth),
                                             labels = scales::percent)) +
      ggtitle("Total shipments") +
      theme_bw()
    

    enter image description here