Search code examples
rdataframegroup-by

How do I format a column based on grouped values?


I have a data frame that looks like the top table of the attached image.

I have groups of names where a name can belong to one of the following categories: category 1, category 2, and special.

I want to output a data frame where only one category is present for each name based on the highest number in the number column.

If an special category is present, then that name should be a special category regardless of the number value.

Please refer to the bottom table of the attached image as an example of the output I am looking for.

my_data:

df <- structure(list(Name = c("Name A", "Name A", "Name B", "Name B", 
"Name C", "Name C", "Name C"), Category = c("Category 1", "Category 2", 
"Category 1", "Category 2", "Category 1", "Category 2", "Special"
), Number = c(10L, 5L, 3L, 20L, 3L, 25L, 4L)), class = "data.frame", row.names = c(NA, 
-7L))

enter image description here


Solution

  • Another tidyverse method that maintains the row ordering:

    df %>%
      group_by(Name)%>%
      mutate(Category = ifelse("Special"%in%Category ,'Special', 
                               Category[which.max(Number)]))
    # A tibble: 7 × 3
    # Groups:   Name [3]
      Name   Category   Number
      <chr>  <chr>       <int>
    1 Name A Category 1     10
    2 Name A Category 1      5
    3 Name B Category 2      3
    4 Name B Category 2     20
    5 Name C Special         3
    6 Name C Special        25
    7 Name C Special         4