Search code examples
rggplot2bar-chartvisualizationyaxis

R & ggplot2: 100% geom_bar + geom_line for average using secondary y axis


As described, I'm trying to plot a 100% stacked bar chart over which I want to show average of all observations. Considering the magnitude of numbers, I want to show those on separate axes. I would normally plot this in Power BI yet default visuals do not support this type of combo. And I keep failing trying to scale the data appropriately.

Here's some sample data + mockup of what I want to achieve.: mockup

and here's input code in R

#input
X_labs <- c(1,1,1,2,2,2,3,3,3)
Grouping<- c("a", "a", "b","a", "a", "b","a", "b", "b")
Value <- c(2,3,8, 3,1,7, 2,9,20)
dataset <- data.frame(X_labs, Grouping, Value)

df_avg <- dataset %>%
group_by(X_labs) %>%
summarize(Value_Avg = mean(Value, na.rm=TRUE))
max_avg <- max(df_avg$Value_Avg)

When I run just the bar chart part, it's as I want it to be: bar chart

However, the combined code is misscaled: combo

And I must be misunderstanding the proper way to approach secondary axis: messy scales

ggplot() + 
  geom_bar(data= dataset,      
               aes(x = X_labs, y = Value, fill = Grouping ), position = "fill", stat = "identity") +
  # scale_y_continuous(labels = scales::percent_format()) +
  geom_line(data=df_avg, aes(x=X_labs, y=Value_Avg, group=1), color="black") +
  # geom_point(data=df_avg, aes(x=X_labs, y=Value_Avg), color="black")
  scale_y_continuous(name="First", labels = scales::percent, 
                     sec.axis = sec_axis(trans = ~.*max_avg, name="Second"), 
                     limits = c(0,100))

Any thoughts on how to achieve the desired effect?

I tried using sec.axis but I fail to set it up correctly.


Solution

  • ggplot2's secondary axes are just a decoration -- you still need to scale your secondary axis data to where you want it on the primary axis.

    ggplot() + 
      geom_bar(data= dataset,      
                   aes(x = X_labs, y = Value, fill = Grouping ), position = "fill", stat = "identity") +
      # scale_y_continuous(labels = scales::percent_format()) +
      geom_line(data=df_avg, aes(x=X_labs, y=Value_Avg/max_avg, group=1), color="black") +
      # geom_point(data=df_avg, aes(x=X_labs, y=Value_Avg), color="black")
      scale_y_continuous(name="First", labels = scales::percent, 
                         sec.axis = sec_axis(trans = ~.*max_avg, name="Second"))
    

    enter image description here