Search code examples
rsortinggroup-byminimum

How to select minimum TRUE values sorted by other variables


I have a very large set of data, that is sorted by facility number, year, and month. In the final column I have a set of TRUE/FALSE statements. I have this final column set up so that for each unique set of (facility number, year, and month) the column will start off as FALSE, but transitions to some point at TRUE (which is determined elsewhere in my program).

I am looking to find the minimum row number in each unique set of (facility number, year, and month) where the final column switches from FALSE to TRUE.

Here is a sample of the table:

     fac_num year month     t
  1:       1 2017     1 FALSE
  2:       1 2017     1 FALSE
  3:       1 2017     1 FALSE
  4:       1 2017     1 FALSE
  5:       1 2017     1 FALSE
 ---                         
2466726:      62 2017     4  TRUE
2466727:      62 2017     4  TRUE
2466728:      65 2017     1  TRUE
2466729:      65 2017     5  TRUE
2466730:      65 2017     5  TRUE

So far I have been able to manage only how to get the lowest row number of TRUE value overall, but not for each unique set of (facility number, year, and month).

min(which(dat0a$t))

The above code returns

64

Which is the correct row number for the lowest TRUE value in (fac_num = 1, year = 2017, month = 1), but I would like to know how to repeat this for any (facility number, year, and month) combination without having to manually enter every combination as there are hundreds of potential combinations.


Solution

  • Here's a solution using dplyr:

    library(dplyr)
    
    dat0a %>%
      group_by(fac_num, year, month) %>%
      summarize(row_n = min(row_number()[t == TRUE]))
    

    First setup your groups using group_by, then take the minimum row_number() where t == TRUE within each group.