Search code examples
rggplot2fillgeom-barfacet-wrap

Fill geom_bar() by one variable and facet by count


I have the following data:

structure(list(validated_1 = c("sombra", "sombra", "sombra", 
"sombra", "sombra", "sombra", "sombra", "sombra", "sombra", "sombra", 
"coscinodiscus", "sombra", "coscinodiscus", "coscinodiscus", 
"sombra", "coscinodiscus", "sombra", "coscinodiscus", "sombra", 
"coscinodiscus", "coscinodiscus", "detritos", "detritos", "coscinodiscus", 
"appendicularia", "detritos", "coscinodiscus", "coscinodiscus", 
"detritos", "coscinodiscus", "langanho", "detritos", "copepodo", 
"langanho", "copepodo", "langanho", "langanho", "coscinodiscus", 
"coscinodiscus", "coscinodiscus"), validated_2 = c("sombra", 
"sombra", "sombra", "sombra", "sombra", "sombra", "sombra", "sombra", 
"sombra", "sombra", "coscinodiscus", "sombra", "coscinodiscus", 
"coscinodiscus", "sombra", "coscinodiscus", "sombra", "coscinodiscus", 
"sombra", "coscinodiscus", "coscinodiscus", "detritos", "detritos", 
"coscinodiscus", "zooplâncton", "detritos", "coscinodiscus", 
"coscinodiscus", "detritos", "coscinodiscus", "langanho", "detritos", 
"zooplâncton", "langanho", "zooplâncton", "langanho", "langanho", 
"coscinodiscus", "coscinodiscus", "coscinodiscus")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -40L))

I work with data in this way and generate this graph:

df %>% 
  group_by(validated_1) %>% 
  summarise(count = n()) %>%
  arrange(desc(count)) %>% 
  mutate(groups = c(rep("high N", 2), rep("lower N", 4))) %>% 
  ggplot(aes(x = reorder(validated_1, -count), y = count)) +
  geom_bar(stat = 'identity') +
  facet_wrap(~ groups, nrow = 2, scales = "free") +
  geom_text(aes(label = count), vjust = -0.5, size = 3)

In this way above, I would be able to facet by counts but not fill bars by groups in variable validated_2.

Another way that I try was:

df %>%
  ggplot(aes(x = fct_infreq(validated_1), fill = validated_2)) +
  geom_bar()

In this way, I was able to fill the bars. However, I don't know how to facet data by count and add the count above the bar. Besides that, I note that this way is very slower than the first way (without the fill) for huge datasets (>10 million of rows).

Thanks all


Solution

  • Add validate_2 to the group_by so that it is still present in the dataset after summarizing and could e mapped on fill. Also, you could simplify this step by switching to dplyr::count:

    library(dplyr)
    library(ggplot2)
    
    df %>%
      count(validated_1, validated_2, sort = TRUE, name = "count") %>%
      mutate(groups = c(rep("high N", 2), rep("lower N", 4))) %>%
      ggplot(aes(x = reorder(validated_1, -count), y = count)) +
      geom_col(aes(fill = validated_2)) +
      facet_wrap(~groups, nrow = 2, scales = "free") +
      geom_text(aes(label = count), vjust = -0.5, size = 3)
    

    enter image description here