I need to build a graph of bar chart and line graph with independent Y-axis (the secondary axis is not related to the primary axis).
The example dataset:
QY = c("3Q22", "4Q22", "1Q23",
"2Q23", "3Q23")
shipment = c(50400.10,
60268.39,
46853.50,
45867.33,
50555.89)
growth = c(0.15759993,
0.19579914,
-0.22258583,
-0.02104799,
0.10222003)
df <- data.frame(as.factor(QY), shipment, growth)
A function that I preused to standardized all graphs to our in-house style:
mystyle <- function(){
font <- "Calibri" #assign font family up front
theme_bw() %+replace% #replace elements we want to change
theme(
#grid elements
panel.grid.major.x = element_blank(), #strip major gridlines
panel.grid.minor = element_blank(), #strip minor gridlines
panel.border = element_blank(), #strip box
# axis.ticks.y = element_blank(), #strip y axis ticks
axis.line = element_line(color = 'black'), #add axis line
#text elements
plot.title = element_text( #title
family = font, #set font family
size = 12, #set font size
face = 'bold', #bold typeface
hjust = 0, #left align
vjust = 2), #raise slightly
plot.subtitle = element_text( #subtitle
family = font, #font family
size = 10), #font size
plot.caption = element_text( #footer
family = font, #font family
size = 9, #font size
hjust = c(1,0)), #right align
axis.title = element_text( #axis titles
family = font, #font family
face = "bold",
size = 10), #font size
axis.text = element_text( #axis text
family = font, #axis family
size = 9), #font size
axis.text.x = element_text( #margin for axis text
margin=margin(5, b = 10)),
axis.title.x = element_blank(), #remove x axis title
legend.text = element_text(
family = font,
size = 9),
legend.title = element_blank()
#since the legend often requires manual tweaking
#based on plot content, don't define it here
)
}
Following the historical question on ggplot with 2 y axes, I understand that ggplot actually did not accept dual independent y axes. But I must rebuild this graph exactly as it is. This is a matter of automating existing graph from excel to R.
So I follow the most relevant answer to mine, on Climatograph dataset that calculate the scale.
ylim.prim <- c(0, 70) # for shipment
ylim.sec <- c(-0.25, 0.25) # for growth
#following formula in Stack Overflow eventhough I don't understand it
b <- diff(ylim.prim)/diff(ylim.sec)
a <- ylim.prim[1] - b*ylim.sec[1]
ggplot(df, aes(QY, shipment)) +
geom_col(fill = "purple") +
geom_line(aes(y = a + growth*b), color = "green") +
scale_y_continuous("Thousands of units",
labels = function(x)x/1000, #show thousands only
expand = expansion(add = 0), #remove gap
sec.axis = sec_axis(~ (. - a)/b, name = "Growth",
labels = scales::percent_format(scale = 0.01))) +
ggtitle("Total shipments") +
mystyle()
The line graph is not showing, the factor level is wrong and I have no idea what have I done to the secondary axis.
Any help in making this graph closer to the images is appreciated. In particular, how to calculate the scale for secondary axis.
I wonder if you are over-complicating things a bit.
Firstly, ensure that your year - quarters are factored in the correct order:
df <- data.frame(
QY = factor(1:5, labels = c("3Q22", "4Q22", "1Q23", "2Q23", "3Q23")),
shipment = c(50400.1, 60268.39, 46853.5, 45867.33, 50555.89),
growth = c(0.15759993, 0.19579914, -0.22258583, -0.02104799, 0.10222003)
)
When you plot, you can make sure the labels are correct on your primary axis by dividing by 1000 to give you the "thousands" shown on your Excel plot.
To get the second axis right, you just need to multiply the values in the growth column so that the bottom number is more than zero, and the top number is close to 60.
For example, you can multiply by 120 and add 40:
ggplot(df, aes(QY, shipment/1000)) +
geom_col(fill = "purple") +
geom_line(aes(y = growth * 120 + 40, group = 1), color = "green") +
scale_y_continuous('Shipments (thousands of units)',
sec.axis = sec_axis(~(.x - 40)/120,
labels = scales::percent)) +
ggtitle("Total shipments") +
theme_bw()
If you want to fully automate the process, you can use the following two functions. The first one takes the primary and secondary variables, and returns an appropriately transformed version of the secondary variable. The second function returns a formula that reverses this process for the secondary axis:
make_trans <- function(var1, var2) {
mult <- max(var1)/diff(range(var2))
add <- -min(var2)*mult
var2 * mult + add
}
reverse_trans <- function(var1, var2) {
mult <- max(var1)/diff(range(var2))
add <- -min(var2)*mult
as.formula(paste0('~(.x - ', add, ')/', mult))
}
You could use them in your case like this:
ggplot(df, aes(QY, shipment)) +
geom_col(fill = "purple") +
geom_line(aes(y = make_trans(shipment, growth), group = 1), color = "green") +
scale_y_continuous('Shipments',
sec.axis = sec_axis(reverse_trans(df$shipment, df$growth),
labels = scales::percent)) +
ggtitle("Total shipments") +
theme_bw()