Download specific files from url in r

I would like to download multiple files (around 2000) from this url : https://www.star.nesdis.noaa.gov/pub/corp/scsb/wguo/data/Blended_VH_4km/geo_TIFF/

However, to limit time and space, I would like to download only the files that contain the name VCI.tif and only the years between 1981 - 2011.

I used wget on bash but could not find a way to select what I want. Additionally, the space consumed is huge (more than 140G).

Thank you !

Solution

The following uses wget and it works at least with the first 2 files, I have tested the downloads of a (very) small subset of the wanted files.

suppressPackageStartupMessages({
  library(httr)
  library(rvest)
  library(dplyr)
  library(stringr)
})

# big files need greater timeout values,
# since I'm using wget this is probably
# unnecessary
old_timeout <- options(timeout = 300)
getOption("timeout")

year_start <- 1981
year_end <- 2011
download_dir <- "~/Temp/"
wget_cmd_line <- c("-P", download_dir, "")

link <- "https://www.star.nesdis.noaa.gov/pub/corp/scsb/wguo/data/Blended_VH_4km/geo_TIFF/"
page <- read_html(link)

files_urls <- page %>%
  html_elements("a") %>%
  html_attr("href")

wanted_urls <- files_urls %>%
  str_extract(pattern = "^.*\\.VCI\\.tif$") %>%
  na.omit() %>%
  data.frame(filename = .) %>% 
  mutate(year = str_extract(filename, "\\d{7}"),
         year = str_extract(year, "^\\d{4}"),
         year = as.integer(year)) %>%
  filter(year >= year_start & year <= year_end)

wanted_urls %>%
  #
  # to test the code I only download 2 files;
  # comment out this instruction to download all of them
  head(n = 2) %>%
  #
  pull(filename) %>%
  lapply(\(x) {
    wget_cmd <- wget_cmd_line
    wget_cmd[3] <- paste0(link, x)
    system2("wget", args = wget_cmd, stdout = TRUE, stderr = TRUE)
  })

# put saved value back
options(old_timeout)