I have the following script to create a figure:
histogram_endo_cases_controls <- ggplot(prs_data, aes(x = normalized, fill = as.factor(endometriosis))) +
geom_histogram(position = "identity", alpha = 0.5, binwidth = 0.2, color = "black") +
scale_fill_manual(values = c("blue", "yellow"),
labels = c("Controls", "Cases"),
name = "Group") +
labs(title = "Histogram of Polygenic Risk Scores",
x = "PRS",
y = "Frequency") +
theme_minimal()
The data I am plotting are scores for two groups, cases and controls. Cases are coded as 1, and controls are coded as 0.
I would like to plot the percentage of individuals experiencing the same score both in cases and controls, because I have a big difference in the numbers (significantly more controls). So the plot would look like the attached example (y axis having a percentage so the histograms have the same height).
Reproducible example: For both cases and controls:
data_all <- data.frame(
x = c(0.00, -0.54, 1.35, 1.23, -2.34),
y = c(304000, 100500, 50300, 55400, 12)
)
Just cases:
data_cases <- data.frame(
x = c(0.00, -0.54, 1.35, 1.23, -2.34),
y = c(4000, 500, 300, 400, 2)
)
Just controls:
data_controls <- data.frame(
x = c(0.00, -0.54, 1.35, 1.23, -2.34),
y = c(300000, 100000, 50000, 55000, 10)
)
So as you can see, this is the number of individuals rather than the percentage of individuals. So when I plot them separately, the height of the cases is really low and the difference between their distribution cannot be seen.
Instead of ggplot2::geom_histogram()
, use ggplot2::geom_density(alpha = .5)
.