Search code examples
rggplot2geom-bar

R - tidyverse/ggplot bar chart with custom discrete data labels and sorted by one variable?


I have a data frame with which I am learning tidyverse methods in R that looks like this:

> glimpse(data)
Observations: 16
Variables: 6
$ True.species  <fct> Badger, Blackbird, Brown hare, Domestic cat, Domestic d...
$ misidentified <dbl> 17, 16, 59, 20, 12, 24, 28, 6, 3, 7, 191, 19, 110, 21, ...
$ missed        <dbl> 61, 106, 7, 24, 16, 160, 110, 12, 15, 37, 200, 58, 259,...
$ Total         <dbl> 78, 122, 66, 44, 28, 184, 138, 18, 18, 44, 391, 77, 369...
$ PrMissed      <dbl> 0.7820513, 0.8688525, 0.1060606, 0.5454545, 0.5714286, ...
$ PrMisID       <dbl> 0.21794872, 0.13114754, 0.89393939, 0.45454545, 0.42857...

Here is the dput():

data <- structure(list(True.species = structure(c(1L, 2L, 3L, 5L, 6L, 
7L, 8L, 9L, 13L, 16L, 17L, 18L, 20L, 21L, 22L, 23L), .Label = c("Badger", 
"Blackbird", "Brown hare", "Crow", "Domestic cat", "Domestic dog", 
"Grey squirrel", "Hedgehog", "Horse", "Human", "Jackdaw", "Livestock", 
"Magpie", "Muntjac", "Nothing", "Pheasant", "Rabbit", "Red fox", 
"Red squirrel", "Roe Deer", "Small rodent", "Stoat or Weasel", 
"Woodpigeon"), class = "factor"), misidentified = c(17, 16, 59, 
20, 12, 24, 28, 6, 3, 7, 191, 19, 110, 21, 5, 13), missed = c(61, 
106, 7, 24, 16, 160, 110, 12, 15, 37, 200, 58, 259, 473, 9, 17
), Total = c(78, 122, 66, 44, 28, 184, 138, 18, 18, 44, 391, 
77, 369, 494, 14, 30), PrMissed = c(0.782051282051282, 0.868852459016393, 
0.106060606060606, 0.545454545454545, 0.571428571428571, 0.869565217391304, 
0.797101449275362, 0.666666666666667, 0.833333333333333, 0.840909090909091, 
0.51150895140665, 0.753246753246753, 0.70189701897019, 0.95748987854251, 
0.642857142857143, 0.566666666666667), PrMisID = c(0.217948717948718, 
0.131147540983607, 0.893939393939394, 0.454545454545455, 0.428571428571429, 
0.130434782608696, 0.202898550724638, 0.333333333333333, 0.166666666666667, 
0.159090909090909, 0.48849104859335, 0.246753246753247, 0.29810298102981, 
0.0425101214574899, 0.357142857142857, 0.433333333333333)), row.names = c(NA, 
-16L), class = "data.frame")

I managed to make a rudimentary plot of what I want with ggplot() as follows:

ggplot(data = data, aes(x = True.species, y = PrMissed)) + geom_bar(stat = "identity")

enter image description here

But there are three things I can't figure out how to do:

  1. I want a stacked bar chart where the variables PrMissed and PrMisID are on top of each other. Note that PrMissed + PrMisID == 1 for each row in the data frame, so the final plot would have equally high stacks but each containing two colors (how do I specify them?), one for PrMissed and another for PrMisID.
  2. I want the order of the bars to be in ascending order of the PrMissed variable so that Brown hare would be on one end and Small rodent on the other.
  3. I prefer this plot to be "flipped" on its side so that the labels (the animal names like "Brown hare") are on the left side and easier to read. An added complexity is that rather than the labels simply saying the animal name, I want them to say the corresponding Total value, so for example Brown hare would get a corresponding axis label like "Brown hare (total = 66)".

I been trying for a long time a for the life of me couldn't figure out an axiomatic way to do this with ggplot(). I know the answer might be simple so please excuse my ignorance. Can anyone help? Thanks in advance.


Solution

  • Here's my answer which does not require the use of data.tables and is solely based on tidyverse packages:

    library(ggplot2)
    library(reshape2)
    library(magrittr)
    library(dplyr)
    # order Species by PrMissed value 
    data$True.species <- factor(data$True.species,
                            levels = data[order(data$PrMissed, decreasing = F),"True.species"])
    
    # reshape to have the stackable values and plot
    melt(data,
     id.vars = c("True.species", "misidentified", "missed", "Total"),
     measure.vars = c("PrMissed", "PrMisID")) %>%
     mutate(x_axis_text = paste(.$True.species, "(Total = ",  .$Total, ")") ) %>%  
       ggplot(aes(x = x_axis_text, y = value, fill = variable) ) +
       geom_bar(stat = "identity") +
       coord_flip() 
    

    Which would result in a plot like this

    enter image description here

    Break down of the code: Your individual points are done like this.

    1) To have stackable values, they need to be all in one column, so using melt from the reshape2 package we tidy the data and create 2 new columns in the data. One is value containing the values from 0 to 1 and the other is variable indicating if that number is associated with PrMissed or PrMisID

    2) Before melting the data we convert the True.species values into factor based on PrMissed values. Use decreasing = T to invert the order if you wish.

    3) coord_flip() flips the x and y axis so that the species are on the y axis instead of the y axis and you can easily read them on the left side.