Search code examples
rggplot2bar-chartyaxis

scale_y_continuous makes graph go missing


I want to create a bar chart that has a year, value and a category dimension. The x-axis should be the different years and within a single year, I want the bars to be in ascending order.

I manage to do it, but when I try to change scale_y_continuous-parameter, all hell breaks loose and no graph is displayed.

Below, I demonstrate the problem with sample data. The first graph is exactly what I want, except for the y-axis. I want to be able to adjust the y-axis (ticks, text, etc.). But as said, when I try to adjust it, the code stops working.

Sample code

# Example data
df <- data.frame(type = c("cat1", "cat1", "cat2", "cat2"),
                 year = c(1,2,1,2),
                 val  = c(100,70,60,100))

library(ggplot2)

# basic plot works
ggplot(df, aes(x = as.factor(year), y = reorder(val, as.factor(year)), fill = type)) +
  geom_bar(stat = "identity", position=position_dodge(0.7), width = 0.6)

# doesnt work... why??
ggplot(df, aes(x = as.factor(year), y = reorder(val, as.factor(year)), fill = type)) +
  geom_bar(stat = "identity", position=position_dodge(0.7), width = 0.6) +
  scale_y_continuous(
    expand = expansion(mult = c(0.03, 0.11)),  
    breaks = seq(0, 100, by = 10), 
    limits = c(0, max(df$val, na.rm = FALSE) + 10) )

Solution

  • Your original plot is treating val as a factor, which is quite unusual/strange: the values are internally being converted to integer values {1, 2, 3}, which is what's actually being plotted along with the factor labels {60, 70, 100}. This means that the distance between 60 and 70 on the y-axis is the same as the distance between 70 and 100 — a strange graphical design decision at best, and misleading at worst:

    bar plot with categories (cat2 [blue], cat1 [red]) and bar locations factor(year) 1, 2, with y-values spaced at locations 1, 2, 3 with labels "60", "70", "100"

    If, as @stefan suggests, you use your second bit of code with y=val rather than making y into a factor, and use tidyverse tools to make a new variable that defines your ordering, you can get something more sensible ...

    library(ggplot2)
    library(dplyr)
    df2 <- df |> arrange(year, val) |> mutate(group = forcats::fct_inorder(paste0(year, type)))
    ggplot(df2, aes(x = as.factor(year), y = val, fill = type, group = group)) +
      geom_bar(stat = "identity", position=position_dodge(0.7), width = 0.6) +
      scale_y_continuous(
        expand = expansion(mult = c(0.03, 0.11)),  
        breaks = seq(0, 100, by = 10), 
        limits = c(0, max(df$val, na.rm = FALSE) + 10) )
    

    the same bar plot, now the y-axis values are at 60, 70, 100