I am working with a series of data.frames
in a list, where each round adds a new year and drops the last year, as shown in the picture below:
Here is the R code for the minimal reproducible example:
library(purrr)
# Define the year ranges:
(year_ranges <- map(0:2, \(increment) {(1991:1995) + increment}))
# Create data.frames:
(df_1 <- map(year_ranges, \(year_range) {
map_dfc(year_range, \(col) {setNames(list(rnorm(n = 4)), as.character(col))})
}))
I'd like to combine them into one data.frame
, and only keep the data from the latest round if multiple records exist:
For example, year 1992
has 2 rounds of records, only the newer one will be used (marked as light green) and the older ones will be abandoned.
How can I achieve this? Any comments are welcome.
Here is an option using purrr::reduce
and dplyr::bind_cols
:
library(purrr)
library(dplyr, warn = FALSE)
reduce(
df_1,
\(x, y) {
dplyr::bind_cols(
# Keep only columns from x not present in y
x[!names(x) %in% names(y)],
y
)
}
)
#> # A tibble: 4 × 7
#> `1991` `1992` `1993` `1994` `1995` `1996` `1997`
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.319 -0.852 -0.924 -0.00283 0.639 0.523 0.257
#> 2 0.660 -0.385 -0.356 0.0836 0.290 0.705 -0.471
#> 3 0.699 -1.36 -0.825 0.459 -0.319 0.800 0.778
#> 4 1.11 -0.689 -2.08 0.425 0.0427 1.55 1.35