Search code examples
rdplyrtidyversewindow-functions

Find most frequent value but when there is a tie, choose the most recent in tidyverse


So let's say I have some data like this:

ID  value  date
001     A  2015-12-06
001     A  2015-12-07
001     A  2015-12-08
002     B  2015-12-09
002     C  2015-12-10
003     A  2015-12-11
003     B  2015-12-12
002     B  2015-12-13
004     D  2015-12-13
004     R  2015-12-13

I want to find the value that appears most frequently for each ID. But when there is a tie, take the most recent date's value.

Expected Output:

ID  value
001     A
002     B
003     B
004     R

You may notice in the case of 004 there is the same date AND same ID during a tie. In this case, you can use the lower row entry.


Solution

  • You can use the following code:

    library(dplyr)
    df %>%
      group_by(ID) %>%
      mutate(n = n()) %>%
      filter(date == max(date)) %>%
      summarise(value = value[1])
    #> # A tibble: 4 × 2
    #>      ID value
    #>   <int> <chr>
    #> 1     1 A    
    #> 2     2 B    
    #> 3     3 B    
    #> 4     4 D
    

    Created on 2022-07-02 by the reprex package (v2.0.1)