Search code examples
rgroup-bydplyrdata-manipulationsummarize

dplyr summarize grouped data with another column


I have a data frame pop.subset <-:

state  location   pop
WA     Seattle    100
WA     Kent       20
OR     foo        30
CA     foo2       80

I need the city in each state with the lowest population stored in a data.frame. I have:

result <- pop.subset %>% 
          group_by(state) %>%
          summarise(min = min(pop))

This returns the data.frame:

state   min
WA      20
...    .... etc

But I need the city too. I tried including location in the group_by function like so: group_by(state, location), but then this gives the min each city paired with a state instead of the state with the city like so:

state location pop
WA    Seattle  100
WA    Kent     20
foo   foo      foo

Is there is a simple solution I missing? I want my result to be like so:

state location pop
WA    Kent     20
...   ...      ... etc.

Solution

  • I think you want to group by state, then filter for min(pop):

    pop.subset %>% 
      group_by(state) %>% 
      filter(pop == min(pop)) %>%
      ungroup()
    
    # A tibble: 3 x 3
      state location   pop
      <chr>    <chr> <int>
    1    WA     Kent    20
    2    OR      foo    30
    3    CA     foo2    80