Search code examples
rdata-cleaningdata-wrangling

Filtering for minimum values in multiple columns in R


sorry in advance if this answer isn't formatted well, I'm quite new to R and the SO community, I'd welcome constructive criticism. I have a data frame that looks like this and am trying to filter it so it only contains the minimum 'Cars' and 'Houses' for each person.

my_data = data.frame("Name" = c("Dora", "Dora", "John", "John", "Marie", "Marie"), 
"Cars" = c(2, 3, NA, NA, 4, 1), 
"Houses" = c(NA, NA, 4, 3, 2, NA))
#Name   Cars   Houses
#1  Dora    2     NA
#2  Dora    3     NA
#3  John   NA      4
#4  John   NA      3
#5 Marie    4     2
#6 Marie    1     NA

I want to end up with something like this (especially note the Marie row has changed, but it's ok if that's split on 2 separate rows as well):

#Name   Cars   Houses
#Dora    2     NA
#John   NA     3
#Marie   1     2

OR like this:

#Name   Cars   Houses
#Dora    2     NA
#John   NA      3
#Marie   NA     2
#Marie    1     NA

Based on other answers, I've tried

my_data %>%
group_by(Name) %>%
filter(Cars == min(Cars))
#Name   Cars    Houses
#Dora   2       NA
#Marie  1       NA

but this results in the John rows being dropped before I can filter the minimum Houses. Does anyone have any suggestions for how to approach this? Thanks in advance.


Solution

  • We can use summarise to get the minimum of each column for each name:

    my_data = data.frame("Name" = c("Dora", "Dora", "John", "John", "Marie", "Marie"), 
    "Cars" = c(2, 3, NA, NA, 4, 1), 
    "Houses" = c(NA, NA, 4, 3, 2, NA))
    
    library(dplyr)
    my_data %>% 
      group_by(Name) %>% 
      summarise(Cars = min(Cars, na.rm = TRUE),
                Houses = min(Houses, na.rm = TRUE))
    
    `summarise()` ungrouping output (override with `.groups` argument)
    # A tibble: 3 x 3
      Name   Cars Houses
      <chr> <dbl>  <dbl>
    1 Dora      2    Inf
    2 John    Inf      3
    3 Marie     1      2