I'm trying to have R automatically download and read a .csv file (daily streamflow) from Water Survey of Canada website to a local directory. To do this manually, I have to click on the "Download?" button at this link. And then, I need to click "Comma Separated Values" under "Discharge (daily mean values)".
From the link above, it is clear that I can modify "stn" (station number), "startDate", and "endDate" to obtain data from different stations and periods.
However, when I click the "Download?" button, the link will change to https://wateroffice.ec.gc.ca/download/index_e.html?results_type=real_time, which does not include any station information. I wonder how to download and read the "Discharge (daily mean values)" .csv file for different stations and periods?
library(httr)
library(readr)
base_url <- "https://wateroffice.ec.gc.ca/report/real_time_e.html?stn=07BJ001&mode=Table&startDate=2024-06-20&endDate=2024-07-23&prm1=46&y1Max=&y1Min=&prm2=47&y2Max=&y2Min=#wb-auto-4"
response <- GET(base_url)
The station and dates you have selected in the first step are not passed via the URL as you correctly observed, but are "remembered" at the server. You got a session cookie when you first selected station and date and via this cookie the server can retrieve your selection (I am by no means a WebDev, so excuse my lack of proper technical terms).
That is, all we have to do is to simulate this process in the script as well:
library(httr2)
library(glue)
library(readr)
from <- as.Date("2024-06-20")
to <- as.Date("2024-07-23")
stn <- "07BJ001"
select_url <- glue("https://wateroffice.ec.gc.ca/report/real_time_e.html?",
"stn={stn}&mode=Table&startDate={format(from, '%Y-%m%-%d')}&",
"endDate={format(to, '%Y-%m%-%d')}&",
"prm1=46&y1Max=&y1Min=&prm2=47&y2Max=&y2Min=#wb-auto-4")
download_url <- paste0("https://wateroffice.ec.gc.ca/download/report_e.html?",
"dt=6&df=csv&ext=zip")
cookie_path <- tempfile()
result_path <- tempfile(fileext = ".zip")
select_req <- request(select_url) |>
req_cookie_preserve(cookie_path) |>
req_perform()
download_req <- request(download_url) |>
req_cookie_preserve(cookie_path) |>
req_perform()
download_req |>
resp_body_raw() |>
writeBin(result_path)
td <- tempdir()
fn <- unzip(result_path, list = TRUE)[, "Name"]
unzip(result_path, exdir = td)
(dat <- read_csv(file.path(td, fn),
skip = 9))
# Rows: 34 Columns: 3
# ── Column specification ────────────────────────────────────────────────────────
# Delimiter: ","
# dbl (2): Parameter, Value (m³/s)
# dttm (1): Date (MST)
# ℹ Use `spec()` to retrieve the full column specification for this data.
# ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# # A tibble: 34 × 3
# `Date (MST)` Parameter `Value (m³/s)`
# <dttm> <dbl> <dbl>
# 1 2024-06-20 00:00:00 6 19.7
# 2 2024-06-21 00:00:00 6 15.7
# 3 2024-06-22 00:00:00 6 13
# 4 2024-06-23 00:00:00 6 11.1
# 5 2024-06-24 00:00:00 6 9.93
# 6 2024-06-25 00:00:00 6 15.5
# 7 2024-06-26 00:00:00 6 21.3
# 8 2024-06-27 00:00:00 6 103
# 9 2024-06-28 00:00:00 6 165
# 10 2024-06-29 00:00:00 6 80.6
# # ℹ 24 more rows
# # ℹ Use `print(n = ...)` to see more rows