Search code examples
rdataframemaxdata-manipulation

Compare two observations and keep the higher Vale


I have a DF that looks like this;

ID      Station   Date       Value
111-001 Place1   2022-05-01  39.1
111-002 Place2   2022-05-01  0.01
222-001 Place1   2022-05-02  8.5
222-002 Place2   2022-05-02  40.3

I'd like to compare the similar IDs and keep the higher of the two. How is the best method to achieve this? I prefer using packages to base and any help is appreciated.


Solution

  • Use susbtr to group_by the 3 first numbers of ID, then use max:

    library(dplyr)
    dat %>% 
      group_by(gp = substr(ID, 1, 3)) %>% 
      mutate(max = max(Value))
    
    # A tibble: 4 × 6
    # Groups:   gp [2]
      ID      Station Date       Value gp      max
      <chr>   <chr>   <chr>      <dbl> <chr> <dbl>
    1 111-001 Place1  2022-05-01 39.1  111    39.1
    2 111-002 Place2  2022-05-01  0.01 111    39.1
    3 222-001 Place1  2022-05-02  8.5  222    40.3
    4 222-002 Place2  2022-05-02 40.3  222    40.3
    

    In base R, use ave:

    with(dat, ave(Value, substr(ID, 1, 3), FUN = max))
    #[1] 39.1 39.1 40.3 40.3