Search code examples
rpurrr

Merging within lists and then across lists in R by Date


I have some lists which look like:

$SPA
           Date Formación bruta de capital fijo (real, índice) Consumo privado (real, índice) IRS 1Y
1Q80 1980-03-30                                             NA                             NA     NA
2Q80 1980-06-30                                             NA                             NA     NA
3Q80 1980-09-30                                             NA                             NA     NA
4Q80 1980-12-30                                             NA                             NA     NA
1Q81 1981-03-30                                             NA                             NA     NA
2Q81 1981-06-30                                             NA                             NA     NA
     Renta bruta disponible familias (nominal)
1Q80                                        NA
2Q80                                        NA
3Q80                                        NA
4Q80                                        NA
1Q81                                        NA
2Q81                                        NA

$GER
           Date IRS 1Y
1Q80 1980-03-30     NA
2Q80 1980-06-30     NA
3Q80 1980-09-30     NA
4Q80 1980-12-30     NA
1Q81 1981-03-30     NA
2Q81 1981-06-30     NA

$OTH
           Date Brent oil price (USD/barrel)
1Q80 1980-03-30                        38.25
2Q80 1980-06-30                           38
3Q80 1980-09-30                         33.4
4Q80 1980-12-30                        40.15
1Q81 1981-03-30                        38.35
2Q81 1981-06-30                        33.25

And also:

$FRA
        Date GDP....yoy. GDP....qoq.
1 1980-03-30        3.01        0.97
2 1980-06-30        1.81       -0.72
3 1980-09-30        0.70        0.15
4 1980-12-30        0.20       -0.20
5 1981-03-30       -0.45        0.32
6 1981-06-30        0.98        0.70

$GER
        Date GDP....yoy. GDP....qoq.
1 1980-03-30          NA          NA
2 1980-06-30          NA          NA
3 1980-09-30          NA          NA
4 1980-12-30          NA          NA
5 1981-03-30          NA          NA
6 1981-06-30          NA          NA

$ITA
        Date GDP....yoy. GDP....qoq.
1 1980-03-30          NA          NA
2 1980-06-30          NA          NA
3 1980-09-30          NA          NA
4 1980-12-30          NA          NA
5 1981-03-30          NA          NA
6 1981-06-30          NA          NA

$NOR
        Date GDP....yoy. GDP....qoq.
1 1980-03-30        3.75        1.31
2 1980-06-30        2.52       -1.16
3 1980-09-30        1.74        0.27
4 1980-12-30        0.81        0.40
5 1981-03-30        1.26        1.76
6 1981-06-30        2.21       -0.23

$SPA
        Date GDP....yoy. GDP....qoq.
1 1980-03-30          NA          NA
2 1980-06-30          NA          NA
3 1980-09-30          NA          NA
4 1980-12-30          NA          NA
5 1981-03-30          NA          NA
6 1981-06-30          NA          NA

I want to do 2 things:

  1. In lst1 I want to merge the OTH list with the SPA and GER list by Date. (This will eliminate the OTH list.

  2. Merge list1 and list2, again by Date but also not by their list names - i.e. so only SPA and GER will be merged since they are the only countries which appear in both lists.

Output: A single list consisting of just SPA and GER with OTH merged (in list1) and then both SPA and GER merged from list1 and list2.

Expected output GER (first 2 rows):

           Date IRS 1Y                  Date        Brent oil price (USD/barrel)       Date           GDP....yoy. GDP....qoq.
1Q80 1980-03-30     NA                  1Q80 1980-03-30      NA                    1 1980-03-30          NA          NA
2Q80 1980-06-30     NA                  1Q80 1980-03-30      38.25                 2 1980-06-30          NA          NA

Data:

lst1 = list(SPA = structure(list(Date = structure(c(3741, 3833, 3925, 
4016, 4106, 4198), class = "Date"), `Formación bruta de capital fijo (real, índice)` = c("NA", 
"NA", "NA", "NA", "NA", "NA"), `Consumo privado (real, índice)` = c("NA", 
"NA", "NA", "NA", "NA", "NA"), `IRS 1Y` = c("NA", "NA", "NA", 
"NA", "NA", "NA"), `Renta bruta disponible familias (nominal)` = c("NA", 
"NA", "NA", "NA", "NA", "NA")), row.names = c("1Q80", "2Q80", 
"3Q80", "4Q80", "1Q81", "2Q81"), class = "data.frame"), GER = structure(list(
    Date = structure(c(3741, 3833, 3925, 4016, 4106, 4198), class = "Date"), 
    `IRS 1Y` = c("NA", "NA", "NA", "NA", "NA", "NA")), row.names = c("1Q80", 
"2Q80", "3Q80", "4Q80", "1Q81", "2Q81"), class = "data.frame"), 
    OTH = structure(list(Date = structure(c(3741, 3833, 3925, 
    4016, 4106, 4198), class = "Date"), `Brent oil price (USD/barrel)` = c("38.25", 
    "38", "33.4", "40.15", "38.35", "33.25")), row.names = c("1Q80", 
    "2Q80", "3Q80", "4Q80", "1Q81", "2Q81"), class = "data.frame"))


lst2 = list(FRA = structure(list(Date = structure(c(3741, 3833, 3925, 
4016, 4106, 4198), class = "Date"), GDP....yoy. = c(3.01, 1.81, 
0.7, 0.2, -0.45, 0.98), GDP....qoq. = c(0.97, -0.72, 0.15, -0.2, 
0.32, 0.7)), row.names = c(NA, 6L), class = "data.frame"), GER = structure(list(
    Date = structure(c(3741, 3833, 3925, 4016, 4106, 4198), class = "Date"), 
    GDP....yoy. = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_), GDP....qoq. = c(NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_)), row.names = c(NA, 6L), class = "data.frame"), 
    ITA = structure(list(Date = structure(c(3741, 3833, 3925, 
    4016, 4106, 4198), class = "Date"), GDP....yoy. = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), GDP....qoq. = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_)), row.names = c(NA, 
    6L), class = "data.frame"), NOR = structure(list(Date = structure(c(3741, 
    3833, 3925, 4016, 4106, 4198), class = "Date"), GDP....yoy. = c(3.75, 
    2.52, 1.74, 0.81, 1.26, 2.21), GDP....qoq. = c(1.31, -1.16, 
    0.27, 0.4, 1.76, -0.23)), row.names = c(NA, 6L), class = "data.frame"), 
    SPA = structure(list(Date = structure(c(3741, 3833, 3925, 
    4016, 4106, 4198), class = "Date"), GDP....yoy. = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), GDP....qoq. = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_)), row.names = c(NA, 
    6L), class = "data.frame"))

Solution

  • Not sure if this is the best solution, but the following code works. It uses the function left_join from the package dplyr.

    Side note: Like user onyambu mentioned, having multiple date columns is not necessary. So my solution does have the dates only as rownames.

    library(dplyr)
    merge_oth <- function(element_to_merge, lst){
      short_lst = lst[-which(names(lst) == element_to_merge)]
      single_elem = lst1[[element_to_merge]]
      fin_lst = lapply(short_lst, function(x){
        dplyr::left_join(x, single_elem, by = "Date")
      })
      return(fin_lst)
    }
    merged_list <- merge_oth(element_to_merge = "OTH", lst = lst1)
    
    merge_by_name <- function(list_1, list_2){
      keys = intersect(names(list_1), names(list_2))
      fin_lst = setNames(lapply(keys, function(key) {
        l1 = list_1[[key]]
        l2 = list_2[[key]]
        dplyr::left_join(l1, l2, by = "Date")
      }),  keys)
    }
    final <- merge_by_name(list_1 = merged_list, list_2 = lst2)
    

    The resulting list contains the merged lists for SPA and GER and following your example the output for GER looks like this:

    $GER
            Date IRS 1Y Brent oil price (USD/barrel) GDP....yoy. GDP....qoq.
    1 1980-03-30     NA                        38.25          NA          NA
    2 1980-06-30     NA                           38          NA          NA
    3 1980-09-30     NA                         33.4          NA          NA
    4 1980-12-30     NA                        40.15          NA          NA
    5 1981-03-30     NA                        38.35          NA          NA
    6 1981-06-30     NA                        33.25          NA          NA
    

    I hope this helps. I took inspiration from here.