When representing data in graphs like pie charts or stacked 100% column/bar charts, I typically like to add data labels with the absolute and percentage values of each category. However, there are MANY cases when the percentages in those labels don't add up to 100% due to rounding. Is there any way to fix this?
library(tidyverse)
# Creating a small dataset
df = data.frame(categories = c('Cat1','Cat2','Cat3','Cat4'),
values = c(2200,4700,3000,2000)) %>%
mutate(perc = values / sum(values))
# Creating the data label text.
# This is the step where I need to make a change. More specifically, in the `label_perc` section.
df = df %>% mutate(label_values = format(values,
big.mark = ",",
decimal.mark = ".",
scientific = FALSE),
label_perc = sprintf("%0.0f%%",
perc*100),
data_label = paste(label_values,
label_perc,
sep='\n'))
# Generating the pie chart graph in ggplot2
p = ggplot(df, aes(x = "", y = values, fill = categories)) +
geom_bar(width = 1,
stat = "identity") +
geom_text(aes(label = data_label),
position = position_stack(vjust = 0.5)) +
coord_polar(theta = "y") +
theme_void()
Notice how the percentages don't add up to 100%: 17% + 25% + 39% + 18% = 99%.
Is there a way to generate these data labels in a way that even the rounded percentages add up to 100%?
The same problem happens when I'm working in Excel. When it does, I just create a new column with the rounded percentages and then, for the last category, instead of using the ROUND()
function, I use 1 - SUM(...)
, like this:
This works great in Excel, but I'm not quite sure how to translate this solution into R.
I was able to find a good general solution in another SO thread (How to make rounded percentages add up to 100%) and implement it in R.
The code generates a new column called perc_rounded
which will always add up to 100%.
df = df %>%
mutate(
perc_cumsum = round(cumsum(perc),2),
perc_cumsum_off = replace_na(lag(perc_cumsum,1),0),
perc_rounded = perc_cumsum - perc_cumsum_off)