Search code examples
rtidyversedistinct-values

How can I pull the most counted values for each level of a variable?


Original dataset

I want to get only the most counted values of a level in a variable? My code ise below:

`

a <- format_separated %>% 
  group_by(state, format) %>% 
  summarise(total = n(),
            .groups = "drop") %>% 
  arrange(desc(total)) 

`

State Format Total
California Public radio 25
New York Country 17
Ohio Classical 14
New York Public radio 12

(1015 entries)

But I just want to get the most counted format of each single state like this:

State Format Total
California Public radio 25
New York Country 17
Ohio Classical 14
Florida Public radio 11

(46 entries)

The final dataset I intend to obtain should include the 50 states of the US and should not be repeated.


Solution

  • library(tidyverse)
    
    df <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-11-08/state_stations.csv")
    
    df %>%  
      count(state, format, sort = TRUE) %>%  
      group_by(state) %>%  
      slice_head() %>%  
      arrange(-n)
    
    # A tibble: 50 x 3
    # Groups:   state [50]
       state      format      n
       <chr>      <chr>   <int>
     1 Texas      Country   148
     2 California Variety   116
     3 Kentucky   Country    71
     4 Tennessee  Country    68
     5 Missouri   Country    66
     6 Minnesota  Country    62
     7 Illinois   Country    59
     8 New_York   Country    52
     9 Arkansas   Country    51
    10 Georgia    Country    51
    # ... with 40 more rows
    # i Use `print(n = ...)` to see more rows