This is my data
df <- data.frame(
city = c("London", "Paris", "Rome", "Madrid", "Venice", "Bern"),
Spring = c(10, 3, 6, 9, 23, 8),
Summer = c(1, 5, 6, 4, 30, 12),
Fall = c(22, 24, 15, 4, 12, 8),
Winter = c(0, 12, 4, 22, 7, 9),
check.names=F
) %>%
janitor::adorn_totals(c("row")) %>%
janitor::adorn_percentages("col") %>%
janitor::adorn_pct_formatting(digits = 2) %>%
janitor::adorn_ns(position = "front")
I want to create data frame below, which is simply showing what the two top mostly visited cities at each season are.
This is what I tried
semi_output <- df %>%
filter(city != 'Total')%>%
pivot_longer(cols = -city) %>%
group_by(name) %>%
slice_max(value, n = 2, with_ties = FALSE) %>%
unite(city, c("city", "value"), sep = '-')
It seems like it works in terms of capturing the top 2 cities for each season (which is weird! These cells are not numbers they contain characters such as "(" and "%" is this reliable always?How is R able to sort them?)
But my main question now is how to convert this format to what I showed above as my desired output?
We can use simpler functions if we do the calculations first and the formatting / display options at the end.
library(dplyr); library(tidyr)
df %>%
pivot_longer(-city) %>%
mutate(pct = value / sum(value), .by = name) %>%
slice_max(value, n = 2, by = name) %>%
mutate(row = row_number(), .by = name) %>%
transmute(row, name, label = paste0(city, " (", sprintf("%0.2f%%", pct*100), ")")) %>%
pivot_wider(names_from = name, values_from = label)
Result
#A tibble: 2 × 5
row Spring Summer Fall Winter
<int> <chr> <chr> <chr> <chr>
1 1 Venice (38.98%) Venice (51.72%) Paris (28.24%) Madrid (40.74%)
2 2 London (16.95%) Bern (20.69%) London (25.88%) Paris (22.22%)