I have a data frame with which I am learning tidyverse
methods in R that looks like this:
> glimpse(data)
Observations: 16
Variables: 6
$ True.species <fct> Badger, Blackbird, Brown hare, Domestic cat, Domestic d...
$ misidentified <dbl> 17, 16, 59, 20, 12, 24, 28, 6, 3, 7, 191, 19, 110, 21, ...
$ missed <dbl> 61, 106, 7, 24, 16, 160, 110, 12, 15, 37, 200, 58, 259,...
$ Total <dbl> 78, 122, 66, 44, 28, 184, 138, 18, 18, 44, 391, 77, 369...
$ PrMissed <dbl> 0.7820513, 0.8688525, 0.1060606, 0.5454545, 0.5714286, ...
$ PrMisID <dbl> 0.21794872, 0.13114754, 0.89393939, 0.45454545, 0.42857...
Here is the dput()
:
data <- structure(list(True.species = structure(c(1L, 2L, 3L, 5L, 6L,
7L, 8L, 9L, 13L, 16L, 17L, 18L, 20L, 21L, 22L, 23L), .Label = c("Badger",
"Blackbird", "Brown hare", "Crow", "Domestic cat", "Domestic dog",
"Grey squirrel", "Hedgehog", "Horse", "Human", "Jackdaw", "Livestock",
"Magpie", "Muntjac", "Nothing", "Pheasant", "Rabbit", "Red fox",
"Red squirrel", "Roe Deer", "Small rodent", "Stoat or Weasel",
"Woodpigeon"), class = "factor"), misidentified = c(17, 16, 59,
20, 12, 24, 28, 6, 3, 7, 191, 19, 110, 21, 5, 13), missed = c(61,
106, 7, 24, 16, 160, 110, 12, 15, 37, 200, 58, 259, 473, 9, 17
), Total = c(78, 122, 66, 44, 28, 184, 138, 18, 18, 44, 391,
77, 369, 494, 14, 30), PrMissed = c(0.782051282051282, 0.868852459016393,
0.106060606060606, 0.545454545454545, 0.571428571428571, 0.869565217391304,
0.797101449275362, 0.666666666666667, 0.833333333333333, 0.840909090909091,
0.51150895140665, 0.753246753246753, 0.70189701897019, 0.95748987854251,
0.642857142857143, 0.566666666666667), PrMisID = c(0.217948717948718,
0.131147540983607, 0.893939393939394, 0.454545454545455, 0.428571428571429,
0.130434782608696, 0.202898550724638, 0.333333333333333, 0.166666666666667,
0.159090909090909, 0.48849104859335, 0.246753246753247, 0.29810298102981,
0.0425101214574899, 0.357142857142857, 0.433333333333333)), row.names = c(NA,
-16L), class = "data.frame")
I managed to make a rudimentary plot of what I want with ggplot()
as follows:
ggplot(data = data, aes(x = True.species, y = PrMissed)) + geom_bar(stat = "identity")
But there are three things I can't figure out how to do:
PrMissed
and PrMisID
are on top of each other. Note that PrMissed + PrMisID == 1
for each row in the data frame, so the final plot would have equally high stacks but each containing two colors (how do I specify them?), one for PrMissed
and another for PrMisID
.PrMissed
variable so that Brown hare
would be on one end and Small rodent
on the other.Total
value, so for example Brown hare
would get a corresponding axis label like "Brown hare (total = 66)".I been trying for a long time a for the life of me couldn't figure out an axiomatic way to do this with ggplot()
. I know the answer might be simple so please excuse my ignorance. Can anyone help? Thanks in advance.
Here's my answer which does not require the use of data.tables
and is solely based on tidyverse
packages:
library(ggplot2)
library(reshape2)
library(magrittr)
library(dplyr)
# order Species by PrMissed value
data$True.species <- factor(data$True.species,
levels = data[order(data$PrMissed, decreasing = F),"True.species"])
# reshape to have the stackable values and plot
melt(data,
id.vars = c("True.species", "misidentified", "missed", "Total"),
measure.vars = c("PrMissed", "PrMisID")) %>%
mutate(x_axis_text = paste(.$True.species, "(Total = ", .$Total, ")") ) %>%
ggplot(aes(x = x_axis_text, y = value, fill = variable) ) +
geom_bar(stat = "identity") +
coord_flip()
Which would result in a plot like this
Break down of the code: Your individual points are done like this.
1) To have stackable values, they need to be all in one column, so using melt
from the reshape2
package we tidy the data and create 2 new columns in the data
. One is value
containing the values from 0 to 1 and the other is variable
indicating if that number is associated with PrMissed
or PrMisID
2) Before melt
ing the data we convert the True.species
values into factor based on PrMissed
values. Use decreasing = T
to invert the order if you wish.
3) coord_flip()
flips the x and y axis so that the species are on the y axis instead of the y axis and you can easily read them on the left side.