Search code examples
rggplot2geom-bar

Ordering of R geom_bar plot


I have a dataset (1000 IDs, 9 classes) similar to this one:

ID     Class     Value
1      A         0.014
1      B         0.665
1      C         0.321
2      A         0.234
2      B         0.424
2      C         0.342
...    ...       ...

The Value column are (relative) abundances, i.e. the sum of all classes for one individual equals 1.

I would like to create a ggplot geom_bar plot in R where the x axis is not ordered by IDs but by decreasing class abundance, similar to this one:

enter image description here

In our example, let's say that Class B is the most abundant class across all individuals, followed by Class C and finally Class A, the first bar of the x axis would be for the individual with the highest Class B, the second bar would the individual with the second highest Class B, etc.

This is what I tried:

ggplot(df, aes(x=ID, y=Value, fill=Class)) +
  geom_bar(stat="identity") +
  xlab("") +
  ylab("Relative Abundance\n")

Solution

  • You can do the reordering before passing the result to ggplot():

    library(dplyr)
    library(ggplot2)
    
    # sum the abundance for each class, across all IDs, & sort the result
    sort.class <- df %>% 
      count(Class, wt = Value) %>%
      arrange(desc(n)) %>%
      pull(Class)
    
    # get ID order, sorted by each ID's abundance in the most abundant class
    ID.order <- df %>%
      filter(Class == sort.class[1]) %>%
      arrange(desc(Value)) %>%
      pull(ID)
    
    # factor ID / Class in the desired order
    df %>%
      mutate(ID = factor(ID, levels = ID.order)) %>%
      mutate(Class = factor(Class, levels = rev(sort.class))) %>%
      ggplot(aes(x = ID, y = Value, fill = Class)) +
      geom_col(width = 1) #geom_col is equivalent to geom_bar(stat = "identity")
    

    plot

    Sample data:

    library(tidyr)
    
    set.seed(1234)
    df <- data.frame(
      ID = seq(1, 100),
      A = sample(seq(2, 3), 100, replace = TRUE),
      B = sample(seq(5, 9), 100, replace = TRUE),
      C = sample(seq(3, 7), 100, replace = TRUE),
      D = sample(seq(1, 2), 100, replace = TRUE)
    ) %>%
      gather(Class, Value, -ID) %>%
      group_by(ID) %>%
      mutate(Value = Value / sum(Value)) %>%
      ungroup() %>% 
      arrange(ID, Class)
    
    > df
    # A tibble: 400 x 3
          ID Class  Value
       <int> <chr>  <dbl>
     1     1 A     0.143 
     2     1 B     0.357 
     3     1 C     0.429 
     4     1 D     0.0714
     5     2 A     0.176 
     6     2 B     0.412 
     7     2 C     0.294 
     8     2 D     0.118 
     9     3 A     0.2   
    10     3 B     0.4   
    # ... with 390 more rows