Search code examples
rpurrr

How to combine elements from a list and ensure only the latest record is used when applicable?


I am working with a series of data.frames in a list, where each round adds a new year and drops the last year, as shown in the picture below:

enter image description here

Here is the R code for the minimal reproducible example:

library(purrr)

# Define the year ranges:
(year_ranges <- map(0:2, \(increment) {(1991:1995) + increment}))


# Create data.frames:
(df_1 <- map(year_ranges, \(year_range) {
  
  map_dfc(year_range, \(col) {setNames(list(rnorm(n = 4)), as.character(col))})
  
}))

I'd like to combine them into one data.frame, and only keep the data from the latest round if multiple records exist:

enter image description here

For example, year 1992 has 2 rounds of records, only the newer one will be used (marked as light green) and the older ones will be abandoned.

How can I achieve this? Any comments are welcome.


Solution

  • Here is an option using purrr::reduce and dplyr::bind_cols:

    library(purrr)
    library(dplyr, warn = FALSE)
    
    reduce(
      df_1,
      \(x, y) {
        dplyr::bind_cols(
          # Keep only columns from x not present in y
          x[!names(x) %in% names(y)],
          y
        )
      }
    )
    #> # A tibble: 4 × 7
    #>   `1991` `1992` `1993`   `1994`  `1995` `1996` `1997`
    #>    <dbl>  <dbl>  <dbl>    <dbl>   <dbl>  <dbl>  <dbl>
    #> 1  0.319 -0.852 -0.924 -0.00283  0.639   0.523  0.257
    #> 2  0.660 -0.385 -0.356  0.0836   0.290   0.705 -0.471
    #> 3  0.699 -1.36  -0.825  0.459   -0.319   0.800  0.778
    #> 4  1.11  -0.689 -2.08   0.425    0.0427  1.55   1.35