R: How to identify indices of minima of all groups in data frame

In R, say I have a data frame times of times with columns: athlete (character), season (integer), distance (factor, out of 400, 800, 1500, 5000, 10000) and tm (float) and want to identify the indices of rows that are the lowest value of time for each unique combination of the other four variables.

I can do this with the following code that sorts by grouping columns and then by tm:

times1 <- times # make copy of array
times1$rownum <- 1:nrow(times1) # add column of row numbers
times1 <- times1[with(times1, order(athlete, season, distance, tm)), ] # sort array
whichmins <- times1$rownum[!duplicated(subset(times1, select = -c(tm, rownum)))] # identify rows where grouping factors change

But I was wondering if there was a more concise way to do it using aggregate, dplyr or data tables. I tried using dplyr's group_by function with which.min but I could not get it to work.

Thank you

Solution

With tidyverse, similar approach would be to arrange by the columns, filter the distinct elements based on the logical vector from duplicated and pull the 'rownum'

library(dplyr)
times %>%
     mutate(rownum = row_number()) %>%
     arrange(athlete, season, distance, tm) %>%
     filter(!duplicated(select(., -c(tm, rownum))) %>%
     pull(rownum)

Or instead of duplicated, use the distinct

times %>%
   mutate(rownum = row_number()) %>%
   arrange(athlete, season, distance, tm) %>%
   distinct(across(-c(tm, rownum)), .keep_all = TRUE) %>%
   pull(rownum)

If we want to use a group by operation, then after the grouping by 'athlete', 'season', 'distance', slice the row where the 'tm' is minimum and pull the 'rownum'

times %>%
    mutate(rownum = row_number())
    group_by(athlete, season, distance) %>%
    slice_min(tm) %>%
    pull(rownum)

Or with summarise

times %>%
    mutate(rownum = row_number())
    group_by(athlete, season, distance) %>%
    summarise(rownum = rownum[which.min(tm)]) %>%
    pull(rownum)

Or using data.table

library(data.table)
setDT(times)[order(athlete, season, distance, tm), 
   .I[!duplicated(.SD[, setdiff(names(.SD), 'tm')), with = FALSE])]]

Or with unique

unique(setorder(setorder(setDT(times, keep.rownames = TRUE),
      athlete, season, distance, tm), by = c('athlete', 'season', 'distance'))[, rn]