Search code examples
rwindowsweb-scrapingrvesthttr

How to proceed when redirected to page after successful sign in with POST method


I have signed in a website using R 3.5.2, and this seems to be gone well both using rvest_0.3.4 and httr_1.4.0, but then I get stuck into a redirecting page which, on the browser (Chrome), is visualized only for a few secs after I hit the button "Login!".

The problematic step seems to be a form method="post" input type="hidden" which I don't manage to submit from R.

URL of the sign in CDP page

signin <- "https://www.cdp.net/en/users/sign_in"

rvest

library(rvest)

user.email <- "my_email"
user.password <- "my_password"

signin.session <- html_session(signin)
signin.form <- html_form(signin.session)[[1]]
filled.signin <- set_values(signin.form, 
                            `user[email]` = user.email, 
                            `user[password]` = user.password)

signed.in <- submit_form(signin.session, filled.signin)
read_html(signed.in) %>% html_node("form")

httr

library(httr)

login <- list(
    `user[email]` = "my_email",
    `user[password]` = "my_password",
    submit = "Login!")

signed.in.post <- POST(signin, body = login, encode = "form", verbose())
http_status(signed.in.post)

content(signed.in.post, as = "parsed")

read_html(signed.in.post$url) %>% html_node("form")

My goal is to access my account and browse the website, but I don't know how to go through the redirecting page from R.


Solution

  • SOLVED!
    It was a quite easy and intuitive solution, I just needed to submit the form method="post" input type="hidden" of the redirecting page, i.e. the one encountered in the signed.in session. I solved it with rvest but I think that httr would be equally easy, here comes the code I used:

       library(rvest)
    
       signin.session <- html_session(signin)
       signin.form <- html_form(signin.session)[[1]]
       filled.signin <- set_values(signin.form, 
                                   `user[email]` = user.email, 
                                   `user[password]` = user.password)
    
       signed.in <- submit_form(signin.session, filled.signin)
       redirect.form <- html_form(signed.in)[[1]]
       redirected <- submit_form(signed.in, redirect.form) 
    

    This last object redirected is a session-class object, basically the page which can be normally browsed after signing in the website.

    In case someone has a shorter, more effective, more elegant/sexy/charming solution to proceed...please don't hesitate to share it.
    I'm an absolute beginner of web-scraping, and I am keen to learn more about these operations!

    THX