Search code examples
rggplot2bar-chart

Select only Top 10 bars in GGPLOT2 Bar chart, when count is not a column


I have a tibble, with train route data, and whether the rider was a member or not, using ggplot's bar chart, I have starting station name as x, count as y, and the colour based on if they're a member or not.

However, there are over 700 stations here and thus the chart is cluttered, I'm looking to take the top 10 (the most frequent) and the bottom 10 (the least frequent), the issue is I don't think I can use the standard slice_min and slice_max functions as the count column is not present, as I am relying on ggplot's default behaviour to put the count on the y axis, rather than a count column.

Is there a way to select the top 10 and bottom 10 counts so the chart isn't crowded? Additionally, I'd like to show the top and bottom as 2 sub plots.

   A tibble: 6 × 3
  starting_station_name                ending_station_name                      member_status
  <chr>                                <chr>                                    <chr>        
1 American University East Campus      39th & Veazey St NW                      member       
2 Washington & Independence Ave SW/HHS Independence Ave & L'Enfant Plaza SW/DOE member       
3 15th St & Massachusetts Ave SE       12th St & Pennsylvania Ave SE            member       
4 New Hampshire Ave & Ward Pl NW       14th & Rhode Island Ave NW               casual       
5 11th & Girard St NW                  Georgia & New Hampshire Ave NW           member       
6 15th & W St NW                       California St & Florida Ave NW           member  

using the code

rides_stations <- subset(rides_cleaned, select = c(5,7,8)) 

q1 <- ggplot(rides_stations, aes(x=starting_station_name, fill = member_status)) + 
  geom_bar()

q1

which produces a heavily overcrowded chart. enter image description here


Solution

  • library(dplyr); library(forcats)
    
    data.frame(starting_station_name = sample(letters, 500, TRUE, prob = 26:1),
               member_status = sample(c("casual", "member"), 500, TRUE)) |>
      count(starting_station_name, member_status) |> 
      mutate(starting_station_name = factor(starting_station_name) |>
               fct_lump(n = 10, w = n) |>
               fct_reorder(-n, sum)) |>
      filter(starting_station_name != "Other") |>
      ggplot(aes(starting_station_name, n, fill = member_status)) + 
      geom_col()
    

    enter image description here