Search code examples
rdataframematchrowrepeat

Update table row entry by value in matching row in R


I have

mydf1 <- data.frame(unique.id = c("aa", "bb", "aa", "cc", "bb"), date = c("1/2/21", "1/4/21", "1/8/21", "1/5/21", "12/5/20"), A = c(1, 1, 1, 1, 1))

I need to update the value of A such that for any instance of multiple rows with a given unique.ID (for example rows 1 & 3), the value A is updated to a new value (2) for the row with the later date. This would result in the output of the following

mydf2 <- data.frame(unique.id = c("aa", "bb", "aa", "cc", "bb"), date = c("1/2/21", "1/4/21", "1/8/21", "1/5/21", "12/5/20"), A = c(1, 2, 2, 1, 1))

Where row 3 has A = 2 because both rows 1 and 3 have "aa" and row 3 has the later date, and rows 2 & 5 share a common unique.id but row 2 has the later date.


Solution

  • You can solve it easily if you sort the dates and count the rows grouped by the id

    library(tidyverse)
    
    mydf1 <- data.frame(unique.id = c("aa", "bb", "aa", "cc", "bb"), date = c("1/2/21", "1/4/21", "1/8/21", "1/5/21", "12/5/20"), A = c(1, 1, 1, 1, 1))
    
    mydf2 <- data.frame(unique.id = c("aa", "bb", "aa", "cc", "bb"), date = c("1/2/21", "1/4/21", "1/8/21", "1/5/21", "12/5/20"), A = c(1, 2, 2, 1, 1))
    
    mydf1 %>% 
      mutate(date = lubridate::dmy(date)) %>% 
      group_by(unique.id) %>%
      arrange(date) %>% 
      mutate(result = dplyr::row_number())
    #> # A tibble: 5 x 4
    #> # Groups:   unique.id [3]
    #>   unique.id date           A result
    #>   <chr>     <date>     <dbl>  <int>
    #> 1 bb        2020-05-12     1      1
    #> 2 aa        2021-02-01     1      1
    #> 3 bb        2021-04-01     1      2
    #> 4 cc        2021-05-01     1      1
    #> 5 aa        2021-08-01     1      2
    

    Created on 2021-01-28 by the reprex package (v0.3.0)