I am new to using R and apply and I am trying to download a set of .csv files from a website.
I want to download the years 2004 and 2005 (as an example; I want more years in fact) of three countries, Guatemala (GT), El Salvador (SV), and Honduras (HN).
I could run country by country something like this:
years = c(2004, 2005)
Map(download.file, url = paste0("https://www.colef.mx/emif/datasets/basesdeDatos/sur/", years, "/DEUAGT%20S1%20", years, ".csv"),
destfile = paste0(raw_data, years, ".csv") )
This would get me Guatemalan databases for the years 2004 and 2005, as the Guatemalan bases are defined by "DEAUGT" in the URL. The Honduran and El Salvatorian databases are "DEAUHN" and "DEAUSV", respectively.
But since I'm trying to learn, I wanted to make everything in "one run". So I tried:
countries = c("GT", "HN", "SV")
years = c(2004, 2005, 2007, 2009:2019)
Map(possibly(download.file, otherwise = NA), url = paste0("https://www.colef.mx/emif/datasets/basesdeDatos/sur/", years, "/DEUA", countries, "%20", years, ".csv"),
destfile = paste0(raw_data, countries, years,".csv"))
But instead of downloading the 6 files I wanted (three countries, two years), it downloaded 2 files.
Various posts I found here noted and in RStudio community noted that Map/mapply did not run through all possible combinations of the lists "countries"
and "years"
, and rather made point-wise (or something similar).
I found various suggestions in different settings but none particularly easy, and something tells me there is an easy solution for this. Using expand.grid
creates a data frame and not a list of lists.
you can use the following solution. It is better if we use purrr::walk2
in place of purrr::map2
as we are calling download.file
for its side effect, so walk2
would is a better option:
library(purrr)
# First we create a data frame of all combinations of countries and years
comb <- expand.grid(countries, years)
# Then I wrap `download.file` with possibly for error handling
poss_download <- possibly(download.file, otherwise = NA)
# Then I apply our function on every combination of countries and years
# in a row-wise operation
walk2(comb$Var1, comb$Var2, ~ {
url = paste0("https://www.colef.mx/emif/datasets/basesdeDatos/sur/", .y, "/DEUA", .x, "%20", .y, ".csv")
destfile = paste0(raw_data, .x, .y,".csv")
poss_download(url, destfile)
})
Here is a base R solution for this question.
paste0
I used sprintf
function which according to documentation "returns a character vector containing a formatted combination of text and variable values". I used %d
for integer/numeric values(2 times for years) and %s
for character strings (once for countries) and it should be noted that we have to provide as many variables so that they are incorporated in their places to form a single string of length onetryCatch
in place of purrr::possibly
to handle possible errorsmapply
or Map
to iterate on both vectors url
and destfile
at the same timecomb <- expand.grid(countries, years)
url <- sprintf("https://www.colef.mx/emif/datasets/basesdeDatos/sur/%d/DEUA%s%d.csv", comb$Var2, comb$Var1, comb$Var2)
destfile = paste0(raw_data, comb$Var1, comb$Var2,".csv")
mapply(function(x, y) {
tryCatch(download.file(url, destfile),
error = function(e) {
NA
})
}, url, destfile)