Working with the Pima Indians data set. Goal is to plot diabetes (yes or no) against each of the features. Then provide total number and percentage in the barchart.
Here is the head of the data:
> head(MASS::Pima.te, n = 10)
npreg glu bp skin bmi ped age type
1 6 148 72 35 33.6 0.627 50 Yes
2 1 85 66 29 26.6 0.351 31 No
3 1 89 66 23 28.1 0.167 21 No
4 3 78 50 32 31.0 0.248 26 Yes
5 2 197 70 45 30.5 0.158 53 Yes
6 5 166 72 19 25.8 0.587 51 Yes
7 0 118 84 47 45.8 0.551 31 Yes
8 1 103 30 38 43.3 0.183 33 No
9 3 126 88 41 39.3 0.704 27 No
10 9 119 80 35 29.0 0.263 29 Yes
The data is 332 rows and 8 columns.
All is going well up to the percentage section. Two errors:
MASS::Pima.te |>
dplyr::mutate(dplyr::across(-type, as.numeric)) |>
tidyr::pivot_longer(-type, names_to = "var", values_to = "value") |>
dplyr::summarise(value = sum(value), percentage = round(sum(value) / nrow(MASS::Pima.te), 2), .by = c(type, var)) |>
ggplot2::ggplot(ggplot2::aes(x = type, y = value)) +
ggplot2::geom_col() +
ggplot2::geom_text(
ggplot2::aes(label = value), vjust = -.2
) +
ggplot2::geom_text(
ggplot2::aes(label = paste0(percentage,"%"), vjust = 3, color = "white"
)) +
ggplot2::scale_y_continuous(expand = c(0, 0, .2, 0)) +
ggplot2::facet_wrap(~var, scales = "free") +
ggplot2::labs(title = "Numerical values against y")
color
should not be in inside aes()
. Summarise the sum for YES/NO before calculating the percentage
library(tidyverse)
MASS::Pima.te %>%
pivot_longer(!type) %>%
summarise(across(value, sum), .by = c(type, name)) %>%
mutate(perc = proportions(value), .by = c(name)) %>%
ggplot(aes(x = type, y = value)) +
geom_col() +
geom_text(aes(label = value),
vjust = -.5) +
geom_text(aes(label = scales::percent(perc),
vjust = 1.5),
color = "white") +
facet_wrap(~ name, scales = "free") +
scale_y_continuous(expand = expansion(mult = c(0.1, 0.25)))