Suppose I want to use rvest
to search Google. I can do that using the code below.
url <- 'https://www.google.com/'
search_parameters <-
list('q' = 'dogs')
search_results <-
rvest::session(url) |>
rvest::html_form() |>
purrr::pluck(1) |>
rvest::html_form_set(!!!search_parameters) |>
rvest::html_form_submit()
#> Submitting with 'btnG'
search_results$status_code
#> [1] 200
However, I can't figure out how to navigate to the first link of the results because html_form_submit()
doesn't return a session
object.
search_parameters |>
rvest::session_follow_link(1)
#> Error in `check_session()`:
#> ! `x` must be produced by session()
#> Backtrace:
#> x
#> 1. \-rvest::session_follow_link(search_parameters, 1)
#> 2. \-rvest:::check_session(x)
#> 3. \-rlang::abort("`x` must be produced by session()")
I know I could just create a new session for the example above, but that doesn't work if I need to log in to a site first. Is there a way to use the same session object to continue navigating?
You are probably looking for session_submit()
:
url <- 'https://www.google.com/'
search_parameters <-
list('q' = 'dogs')
s <- rvest::session(url)
s <-
rvest::html_form(s) |>
purrr::pluck(1) |>
rvest::html_form_set(!!!search_parameters) |>
rvest::session_submit(s, form = _)
#> Submitting with 'btnG'
s |>
rvest::session_follow_link(1)
#> Navigating to
#> https://accounts.google.com/ServiceLogin?...
#> <session> https://accounts.google.com/v3/signin/identifier?...
#> Status: 200
#> Type: text/html; charset=utf-8
#> Size: 555260
Created on 2023-06-01 with reprex v2.0.2