Search code examples
rggplot2geom-text

Adding text (sample size) on top of stacked bar chart returns error message in ggplot R


Current figure: Current figure Desired effect: Desired effect

I have a stacked bar chart which I wanted to add sample size on top of the chart, I tried using geomtext with the following code:

Data %>% count(Month, Age) %>%
  group_by(Month) %>%
  mutate(percent = n/sum(n)*100) %>%
  ggplot(aes(Month, percent, fill = as.factor(Age))) +
  geom_col(position = "fill") + ylab("") +
  geom_text(aes(label = n_month, y = 1.05)) +
  scale_y_continuous(labels = scales::percent) +
  scale_fill_manual(values = c("#009E73", "#E69F00", "#0072B2")) +
  theme(axis.text = element_text(size = 17), 
        legend.text = element_text(size = 18),
        axis.title.x = element_text(margin = margin(t = 10), size = 16))

This returns an error, which I understand that it's because there are actually 34 data in this figure, but I only wanted it to display 12 numbers. For now I can only succeed if there's only 12 data (Hence the "Desired effect" figure). How should I change my code?

Error: Aesthetics must be either length 1 or the same as the data (34): label" 
n_month
 [1] 18  8 20 18 24 34 32 15 22 26 12 13

Solution

  • sorry for the delay. I tried to reproduce your data and the issue is the underlying data. For your approach it would be easier to have different datasets for your geoms.

    For this example I am using the nycflights13 data, which is probably similar to your data.

    Here is my setup:

    library(dplyr)
    library(ggplot2)
    library(nycflights13)
    
    graph_data <- flights %>% 
      filter(carrier %in% c("UA", "B6", "EV")) %>% 
      count(carrier, month) %>% 
      add_count(month, wt = n, name = "n_month") %>% 
      mutate(percent = n / n_month * 100) 
    

    Data looks like:

    # A tibble: 36 × 3
       carrier month     n n_month percent
       <chr>   <int> <int>   <int>   <dbl>
     1 B6          1  4427   13235    33.4
     2 B6          2  4103   12276    33.4
     3 B6          3  4772   14469    33.0
    

    Now we supply the geom_col() and geom_text() with different datasets, based on your graph_data.

    
    ggplot() +
      geom_col(
        data = graph_data,
        aes(x = month, y = percent, fill = as.factor(carrier)), 
        position = "fill") + ylab("") +
      geom_text(
        data = distinct(graph_data, month, n_month),
        aes(x = month, y = 1.05, label = n_month)) +
      scale_y_continuous(labels = scales::percent) +
      scale_fill_manual(values = c("#009E73", "#E69F00", "#0072B2")) +
      theme(axis.text = element_text(size = 17), 
            legend.text = element_text(size = 18),
            axis.title.x = element_text(margin = margin(t = 10), size = 16))
    

    I tried to leave your code as much as possible, just added the data = ... argument in the geom_s.

    Output is:

    enter image description here