Apologies for the awkward title; I hope it becomes clear soon.
I have data like this: People are attributed to specific Locations, and it has been recorded whether an event has been successful or the data is missing.
df <- data.frame(PersonID = c(1:20),
Location = c("B","A","D","C","A","D","C","D","A","D","B","A","D","C","A","D","C","D","A","D"),
Success = c("yes","no","yes",NA,"yes","no","no","yes",NA,"yes","no","yes",NA,"yes","no","no","yes",NA,"yes","no"))
I would like to know how each location "performs" relative to the other locations, i. e. how many valid attempts have been successful and how the location's rate of success compares to the other locations.
So in my example, location "A" has seen 5 valid attempts (1 "NA"), of which 3 were successful (60%). The other locations have a success rate of 50%, 66.7%, and 50%, an average of 55.6%. Location A is thus 4.4 percentage points higher than the other locations' average. I want to display all that information in a table just like this:
I don't have a specific preference regarding packages, but I like and know some gt
and flextable
. Alas, not enough to implement this …
Thank you all in advance!
This was not trivial in any way:
library(dplyr)
library(tidyr)
library(purrr)
library(tibble)
library(janitor)
library(gt)
df1 <- df %>%
group_by(Location) %>%
mutate(attempts = sum(!is.na(Success)),
yeses = sum(Success == "yes", na.rm = TRUE),
Success_rate = (yeses/attempts)*100)
df2 <- df1 %>%
summarise(avgother = mean(Success_rate)) %>%
mutate(avgother = map_dbl(row_number(), ~mean(avgother[-.x])))
)
df %>%
group_by(Location) %>%
summarise(attempts = sum(!is.na(Success)),
yeses = sum(Success == "yes", na.rm = TRUE),
Success_rate = (yeses/attempts)*100) %>%
bind_cols(avgother= round(df2$avgother, 1)) %>%
mutate(comp.avg = Success_rate - avgother) %>%
mutate(`attempts` = paste0("(N=", attempts, ")"),
Success_rate = paste0(round(Success_rate, 1), "%")) %>%
select(-yeses) %>%
mutate(comp.avg = ifelse(comp.avg >0, paste0("(+",round(comp.avg, 1),")"), paste0("(",round(comp.avg,1),")"))) %>%
t() %>%
as.data.frame() %>%
rownames_to_column("Location") %>%
row_to_names(row_number = 1) %>%
gt()