I've tried everywhere to find the answer to this question but I am still stuck, so here it is:
I have a data frame data_1
that contains data from an ongoing latent profile analysis. The variables of interest for this question are profiles
and gender
.
I would like to plot gender distribution by profile, but within each profile show what % of each gender we have compared to the entire sample of this gender. For example, if we have 10 women and 5 in Profile 1, I want the text on top of the women bar for Profile 1 to show 50%.
Right now I am using the following code but it is giving me the percentage for the entire population, while I just want the percentage compared to the total number of women.
ggplot(data = subset(data_1, !is.na(gender)),
aes(x = gender, fill = gender)) + geom_bar() +
facet_grid(cols=vars(profiles)) + theme_minimal() +
scale_fill_brewer(palette = 'Accent', name = "Gender",
labels = c("Non-binary", "Man", "Woman")) +
labs(x = "Gender", title = "Gender distribution per LPA profile") +
geom_text(aes(y = ((..count..)/sum(..count..)),
label = scales::percent((..count..)/sum(..count..))),
stat = "count", vjust = -28)
Thanks in advance for your help!
I tried multiple alternatives including creating the variable within the dataset using summarize
and mutate
but with no success unfortunately.
As untidy as it seems, it's likely the best approach to summarise outside of the ggplot2
call, which can be done like this:
library(tidyverse)
data1 <- tibble(gender = sample(c("male", "female"), 100, replace = TRUE),
profile = sample(c("profile1", "profile2"), 100, replace = TRUE))
data1 |>
count(gender, profile) |>
group_by(gender) |>
mutate(perc = n / sum(n)) |>
ggplot(aes(x = gender, y = n, fill = gender)) +
geom_col() +
facet_grid(~profile) +
geom_text(aes(y = n + 3, label = scales::percent(perc)))
The facet_grid
is essentially grouping the dataset by profile
before doing any calculations of values, so in essence it's blind to the data in the other facet. I think only approach is thus summarising before the call and using geom_col
(defaulting to stat = "identity"
) to make the plots. Note that the y
value for the labels is calculated from the count variable - R will position the text relative to the counted values of the bars.
I tell a lie, you can actually do it in the ggplot2
call, but it's a little messier:
data1 |>
ggplot(aes(x = gender, fill = gender)) +
geom_bar() +
facet_grid(~ profile) +
stat_count(aes(y = after_stat(count) + 2,
label = scales::percent(after_stat(count) /
tapply(after_stat(count),
after_stat(group),
sum)[after_stat(group)]
)),
geom = "text")
Code borrowed from here. The after_stat(group)
part is accessing the grouped gender
count across both facets. Today I learned something!