Search code examples
rapplypurrrmapply

Map/mapply with all possible combinations of two lists


I am new to using R and apply and I am trying to download a set of .csv files from a website.

I want to download the years 2004 and 2005 (as an example; I want more years in fact) of three countries, Guatemala (GT), El Salvador (SV), and Honduras (HN).

I could run country by country something like this:

years = c(2004, 2005)    
Map(download.file, url = paste0("https://www.colef.mx/emif/datasets/basesdeDatos/sur/", years, "/DEUAGT%20S1%20", years, ".csv"), 
          destfile = paste0(raw_data, years, ".csv") )

This would get me Guatemalan databases for the years 2004 and 2005, as the Guatemalan bases are defined by "DEAUGT" in the URL. The Honduran and El Salvatorian databases are "DEAUHN" and "DEAUSV", respectively.

But since I'm trying to learn, I wanted to make everything in "one run". So I tried:

countries = c("GT", "HN", "SV")
years = c(2004, 2005, 2007, 2009:2019)

Map(possibly(download.file, otherwise = NA), url = paste0("https://www.colef.mx/emif/datasets/basesdeDatos/sur/", years, "/DEUA", countries, "%20", years, ".csv"), 
              destfile = paste0(raw_data, countries, years,".csv"))

But instead of downloading the 6 files I wanted (three countries, two years), it downloaded 2 files.

Various posts I found here noted and in RStudio community noted that Map/mapply did not run through all possible combinations of the lists "countries" and "years", and rather made point-wise (or something similar).

I found various suggestions in different settings but none particularly easy, and something tells me there is an easy solution for this. Using expand.grid creates a data frame and not a list of lists.


Solution

  • you can use the following solution. It is better if we use purrr::walk2 in place of purrr::map2 as we are calling download.file for its side effect, so walk2 would is a better option:

    library(purrr)
    
    # First we create a data frame of all combinations of countries and years
    comb <- expand.grid(countries, years)
    
    # Then I wrap `download.file` with possibly for error handling
    poss_download <- possibly(download.file, otherwise = NA)
    
    # Then I apply our function on every combination of countries and years 
    # in a row-wise operation
    
    walk2(comb$Var1, comb$Var2, ~ {
      url = paste0("https://www.colef.mx/emif/datasets/basesdeDatos/sur/", .y, "/DEUA", .x, "%20", .y, ".csv")
      destfile = paste0(raw_data, .x, .y,".csv")
      poss_download(url, destfile)
    })
    

    Here is a base R solution for this question.

    • Instead of paste0 I used sprintf function which according to documentation "returns a character vector containing a formatted combination of text and variable values". I used %d for integer/numeric values(2 times for years) and %s for character strings (once for countries) and it should be noted that we have to provide as many variables so that they are incorporated in their places to form a single string of length one
    • Then I used tryCatch in place of purrr::possibly to handle possible errors
    • In the end I used mapply or Map to iterate on both vectors url and destfile at the same time
    comb <- expand.grid(countries, years)
    
    url <- sprintf("https://www.colef.mx/emif/datasets/basesdeDatos/sur/%d/DEUA%s%d.csv", comb$Var2, comb$Var1, comb$Var2)
    
    destfile = paste0(raw_data, comb$Var1, comb$Var2,".csv")
    
    mapply(function(x, y) {
      tryCatch(download.file(url, destfile),
               error = function(e) {
                 NA
               })
    }, url, destfile)