Search code examples
rfunctionmax

Find max value across multiple columns with ties and return column name or tiebreaker NA value


I'm struggling with finding a max() type function that doesn't return 'first', 'last' or random, and which.max.simple() which looked promising isnt available on CRAN anymore.

Chat GPT is struggling with returning this correctly despite multiple attempts so can anyone help?

library(dplyr)

# Function to find column name(s) with max value, handling tie
column_with_max <- function(...) {
  values <- c(...)
  max_value <- max(values)
  max_indices <- which(values == max_value)
  if (length(max_indices) > 1) {
    return(paste(names(values)[max_indices], collapse = ", "))
  } else {
    return(names(values)[values == max_value])
  }
}


# Sample data
data <- data.frame(group = c("A", "A", "B", "B", "C", "C"),
                   value1 = c(0, 0, 1, 0, 1, 1),
                   value2 = c(1, 1, 1, 1, 1, 1),
                   value3 = c(1, 0, 2, 0, 1, 1))

# Grouped column with max value and tie handling
result <- data %>%
  group_by(group) %>%
  summarise(max_column = column_with_max(value1, value2, value3))

desired result would be:

output <- data.frame(group = c("A", "A", "B", "B", "C", "C"),
                   value1 = c(0, 0, 1, 0, 1, 1),
                   value2 = c(1, 1, 1, 1, 1, 1),
                   value3 = c(1, 0, 2, 0, 1, 1),
                   max_column = c("tie", "value2", "value3", "value2", "tie", "tie"))

I've got multiple grouping variables and larger data so a dplyr solution would be most helpful. Thank you


Solution

    • you can use ensyms() to capture the column names, then index into them if only one max is found.
    • to get your sample output, you’ll want to use rowwise %>% mutate, not group_by %>% summarize.
    • when testing for multiple maxima, use dplyr::near() rather than == to mitigate floating point errors.
    • the below also includes NA handling.
    library(dplyr)
    
    column_with_max <- function(..., na.rm = FALSE) {
      nms <- as.character(ensyms(...))
      vals <- c(...)
      max_val <- max(vals, na.rm = na.rm)
      if (is.na(max_val)) return(NA_character_)
      max_idx <- which(near(vals, max_val))
      if (length(max_idx) > 1) return("tie")
      nms[[max_idx]]
    }
    
    data %>%
      rowwise() %>%
      mutate(max_column = column_with_max(value1, value2, value3)) %>%
      ungroup()
    

    Result:

    # A tibble: 6 × 5
      group value1 value2 value3 max_column
      <chr>  <dbl>  <dbl>  <dbl> <chr>     
    1 A          0      1      1 tie       
    2 A          0      1      0 value2    
    3 B          1      1      2 value3    
    4 B          0      1      0 value2    
    5 C          1      1      1 tie       
    6 C          1      1      1 tie