I have a data frame that looks like the top table of the attached image.
I have groups of names where a name can belong to one of the following categories: category 1
, category 2
, and special
.
I want to output a data frame where only one category is present for each name based on the highest number in the number column.
If an special category is present, then that name should be a special
category regardless of the number value.
Please refer to the bottom table of the attached image as an example of the output I am looking for.
my_data:
df <- structure(list(Name = c("Name A", "Name A", "Name B", "Name B",
"Name C", "Name C", "Name C"), Category = c("Category 1", "Category 2",
"Category 1", "Category 2", "Category 1", "Category 2", "Special"
), Number = c(10L, 5L, 3L, 20L, 3L, 25L, 4L)), class = "data.frame", row.names = c(NA,
-7L))
Another tidyverse method that maintains the row ordering:
df %>%
group_by(Name)%>%
mutate(Category = ifelse("Special"%in%Category ,'Special',
Category[which.max(Number)]))
# A tibble: 7 × 3
# Groups: Name [3]
Name Category Number
<chr> <chr> <int>
1 Name A Category 1 10
2 Name A Category 1 5
3 Name B Category 2 3
4 Name B Category 2 20
5 Name C Special 3
6 Name C Special 25
7 Name C Special 4