My aim: On ROR 3, get a PDF file from a site which requires you to login before you can download it
My method, using Mechanize:
Step 1: log in Step 2: since I'm logged in, get the PDF link
Thing is, when I debug and click on the link scraped, I'm redirected to the login page instead of getting the file
There are the 2 controls that I did on step 1:
(...)
search_results = form.submit
puts search_results.body
=> {"succes":true,"URL":"/sso/inscription/"} Apparently the login succeed
puts agent.cookie_jar.jar
=> I could find the information about my session, si I guess that cookies are saved
Any hint about what I did wrong ? (could be important: on the site, when you login into "http://elwatan.com/sso/inscription/inscription_payant.php", you are redirected to the home page (elwatan.com)
Below my code:
# step 1, login:
agent = Mechanize.new
page = agent.get("http://elwatan.com/sso/inscription/inscription_payant.php")
form = page.form_with(:id => 'form-login-page')
form.login = "my_mail"
form.password = "my_pasword"
search_results = form.submit
# step 2, get the PDF:
@watan = {}
page.parser.xpath('//th/a').each do |link|
puts @watan[link.text.strip] = link['href']
end
The agent
variable retains the session and cookies.
So you first do your login, as you did, and then you write agent.get(---your-pdf-link-here--)
.
In your example code is a small error: the result of the submit
is in search_results
and then you continue to use page
to search for the links?
So in your case, I guess it should look like (untested of course) :
# step 1, login:
agent = Mechanize.new
agent.pluggable_parser.pdf = Mechanize::FileSaver
page = agent.get("http://elwatan.com/sso/inscription/inscription_payant.php")
form = page.form_with(:id => 'form-login-page')
form.login = "my_mail"
form.password = "my_pasword"
page = form.submit
# step 2, get the PDF:
page.parser.xpath('//th/a').each do |link|
agent.get link['href']
end