Search code examples
rtidyverservest

Rvest continue navigating after submitting a form


Suppose I want to use rvest to search Google. I can do that using the code below.

url <- 'https://www.google.com/'

search_parameters <-
  list('q' = 'dogs')

search_results <- 
  rvest::session(url) |>
  rvest::html_form() |> 
  purrr::pluck(1) |> 
  rvest::html_form_set(!!!search_parameters) |> 
  rvest::html_form_submit()
#> Submitting with 'btnG'

search_results$status_code
#> [1] 200

However, I can't figure out how to navigate to the first link of the results because html_form_submit() doesn't return a session object.


search_parameters |>
  rvest::session_follow_link(1)
#> Error in `check_session()`:
#> ! `x` must be produced by session()

#> Backtrace:
#>     x
#>  1. \-rvest::session_follow_link(search_parameters, 1)
#>  2.   \-rvest:::check_session(x)
#>  3.     \-rlang::abort("`x` must be produced by session()")

I know I could just create a new session for the example above, but that doesn't work if I need to log in to a site first. Is there a way to use the same session object to continue navigating?


Solution

  • You are probably looking for session_submit():

    url <- 'https://www.google.com/'
    
    search_parameters <-
      list('q' = 'dogs')
    
    s <- rvest::session(url)
    
    s <- 
      rvest::html_form(s) |> 
      purrr::pluck(1) |> 
      rvest::html_form_set(!!!search_parameters) |> 
      rvest::session_submit(s, form = _) 
    #> Submitting with 'btnG'
    
    s |>
      rvest::session_follow_link(1)
    #> Navigating to
    #> https://accounts.google.com/ServiceLogin?...
    #> <session> https://accounts.google.com/v3/signin/identifier?...
    #>   Status: 200
    #>   Type:   text/html; charset=utf-8
    #>   Size:   555260
    

    Created on 2023-06-01 with reprex v2.0.2