Search code examples
rdataframedplyrpivot-table

select top n values (including%) for each column of dataframe and show their relative values of a certain column in R


This is my data

df <- data.frame(
    city = c("London", "Paris", "Rome", "Madrid", "Venice", "Bern"),
    Spring = c(10, 3, 6, 9, 23, 8),
    Summer = c(1, 5, 6, 4, 30, 12),
    Fall = c(22, 24, 15, 4, 12, 8),
    Winter = c(0, 12, 4, 22, 7, 9),
    
    check.names=F
) %>%
    janitor::adorn_totals(c("row")) %>%
    janitor::adorn_percentages("col") %>%
    janitor::adorn_pct_formatting(digits = 2) %>%
    janitor::adorn_ns(position = "front")

enter image description here

I want to create data frame below, which is simply showing what the two top mostly visited cities at each season are.

enter image description here

This is what I tried

  semi_output <- df %>%
  filter(city != 'Total')%>%
  pivot_longer(cols = -city) %>%
  group_by(name) %>%
  slice_max(value, n = 2, with_ties = FALSE) %>%
  unite(city, c("city", "value"), sep = '-')

enter image description here

It seems like it works in terms of capturing the top 2 cities for each season (which is weird! These cells are not numbers they contain characters such as "(" and "%" is this reliable always?How is R able to sort them?)

But my main question now is how to convert this format to what I showed above as my desired output?


Solution

  • We can use simpler functions if we do the calculations first and the formatting / display options at the end.

    library(dplyr); library(tidyr)
    df %>%
      pivot_longer(-city) %>%
      mutate(pct = value / sum(value), .by = name) %>%
      slice_max(value, n = 2, by = name) %>%
      mutate(row = row_number(), .by = name) %>%
      transmute(row, name, label = paste0(city, " (", sprintf("%0.2f%%", pct*100), ")")) %>%
      pivot_wider(names_from = name, values_from = label)
    

    Result

    #A tibble: 2 × 5
        row Spring          Summer          Fall            Winter         
      <int> <chr>           <chr>           <chr>           <chr>          
    1     1 Venice (38.98%) Venice (51.72%) Paris (28.24%)  Madrid (40.74%)
    2     2 London (16.95%) Bern (20.69%)   London (25.88%) Paris (22.22%)