sorry in advance if this answer isn't formatted well, I'm quite new to R and the SO community, I'd welcome constructive criticism. I have a data frame that looks like this and am trying to filter it so it only contains the minimum 'Cars' and 'Houses' for each person.
my_data = data.frame("Name" = c("Dora", "Dora", "John", "John", "Marie", "Marie"),
"Cars" = c(2, 3, NA, NA, 4, 1),
"Houses" = c(NA, NA, 4, 3, 2, NA))
#Name Cars Houses
#1 Dora 2 NA
#2 Dora 3 NA
#3 John NA 4
#4 John NA 3
#5 Marie 4 2
#6 Marie 1 NA
I want to end up with something like this (especially note the Marie row has changed, but it's ok if that's split on 2 separate rows as well):
#Name Cars Houses
#Dora 2 NA
#John NA 3
#Marie 1 2
OR like this:
#Name Cars Houses
#Dora 2 NA
#John NA 3
#Marie NA 2
#Marie 1 NA
Based on other answers, I've tried
my_data %>%
group_by(Name) %>%
filter(Cars == min(Cars))
#Name Cars Houses
#Dora 2 NA
#Marie 1 NA
but this results in the John rows being dropped before I can filter the minimum Houses. Does anyone have any suggestions for how to approach this? Thanks in advance.
We can use summarise
to get the minimum of each column for each name:
my_data = data.frame("Name" = c("Dora", "Dora", "John", "John", "Marie", "Marie"),
"Cars" = c(2, 3, NA, NA, 4, 1),
"Houses" = c(NA, NA, 4, 3, 2, NA))
library(dplyr)
my_data %>%
group_by(Name) %>%
summarise(Cars = min(Cars, na.rm = TRUE),
Houses = min(Houses, na.rm = TRUE))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 3
Name Cars Houses
<chr> <dbl> <dbl>
1 Dora 2 Inf
2 John Inf 3
3 Marie 1 2