I have data like this:
class subclass percent
A X 7.75
A Y 7.75
B Z 1.25
B Z 1.25
B T 1.25
I want to plot a the histogram, classes on x-axis and percents on the y-axis, and bars filled according to the subclass. So for the given example data the histogram should have 2 bars for A and B, 2 values on y, (7.75 for A and 1.25 for B) and the A bar should be divided into 2 groups (50/50 for X and Y) and B bar should be divided into 3 groups (66% Z and 33% T).
I tried using ggplot and geom_histogram:
data %>%
ggplot(aes(x=reorder(class,-percent),
y = percent,
fill = subclass)) +
geom_histogram(stat='identity') +
scale_y_continuous(labels = scales::percent)
This code sums up the percent values for the y axis, so instead of plotting 7.75, it plots 15.5 for A and 3.75 for B. Since the totals are wrong I dont know if the fill = subclass part is working. What am I doing wrong?
Thank you!!
First, what you want is a bar chart so use geom_col
instead of geom_histogram
. Second, as you percent
column reflects the total percent per class, you have to divide by the number of observations per class so that the bars stack to the total. Third, I added a summarise
step to compute the percent per class
and subclass
:
data <- structure(list(class = c("A", "A", "B", "B", "B"), subclass = c(
"X",
"Y", "Z", "Z", "T"
), percent = c(7.75, 7.75, 1.25, 1.25, 1.25)), class = "data.frame", row.names = c(NA, -5L))
library(ggplot2)
library(dplyr, warn=FALSE)
data <- data %>%
group_by(class) %>%
mutate(percent = percent / n()) %>%
group_by(class, subclass) %>%
summarise(percent = sum(percent))
#> `summarise()` has grouped output by 'class'. You can override using the
#> `.groups` argument.
ggplot(data, aes(
x = reorder(class, -percent),
y = percent,
fill = subclass
)) +
geom_col() +
scale_y_continuous(labels = scales::label_percent(scale = 1))
EDIT To add the label with the relative frequency of each subclass per class I would add another column to the dataset, which could then be added as labels via geom_text
:
data <- data %>%
group_by(class) %>%
mutate(percent = percent / n()) %>%
group_by(class, subclass) %>%
summarise(percent = sum(percent)) |>
mutate(label = percent / sum(percent))
ggplot(data, aes(
x = reorder(class, -percent),
y = percent,
fill = subclass
)) +
geom_col() +
geom_text(aes(label = scales::percent(label)), position = position_stack(vjust = .5)) +
scale_y_continuous(labels = scales::label_percent(scale = 1))