To login then to download a PDF file, I have a code that works perfectly fine on ruby when I Debug. Problem is, when I try to use this code on a Rails app with an instance variable, I can't download the file, guess that it's a cookie issue but I didn't achieve to resolve it
here the code that works on Ruby (i can download the PDF file, so the login is a success):
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
agent.pluggable_parser.pdf = Mechanize::FileSaver
page = agent.get("http://elwatan.com/sso/inscription/inscription_payant.php")
# login to the site
form = page.form_with(:id => 'form-login-page')
form.login = "my_login"
form.password = "my_password"
page = form.submit
#get the PDF link
agent.get("http://elwatan.com/").parser.xpath('//div[2]/div/p/a/@href|/img').each do |link|
agent.get link['href']
end
And below my attempt on Ruby On Rails 3, didn't work (I can scrape the link, but not downloading the file because I am getting redirected to the login page:
Controller.rb
@agent = Mechanize.new
@agent.user_agent_alias = 'Mac Safari'
@page = @agent.get("http://elwatan.com/sso/inscription/inscription_payant.php")
# login
form = @page.form_with(:id => 'form-login-page')
form.login = "my_login"
form.password = "my_password"
@page = form.submit
# get the PDF link
@watan = {}
@agent.get("http://elwatan.com/").parser.xpath('//div[2]/div/p/a/@href|/img').each do |link|
@watan[link.text.strip] = @agent.get link['href']
end
View.rb
<% if @watan %>
<% @watan.each do |key, value| %>
<a href="http://www.elwatan.com<%= "#{key}" %>" target='_blank'>download my file</a>
<% end %>
<% end %>
This will be a long post.
First off, you should place your scraping code in a libary, so create the file lib/watan_scraper.rb
and fill it with
module WatanScraper
def self.get_all_pdfs
agent = get_agent
# get the PDF link
watan = []
agent.get("http://elwatan.com/").parser.xpath('//div[2]/div/p/a/@href|/img').each do |link|
watan << link.text.strip
end
watan
end
def self.get_single_pdf(link_text)
agent = get_agent
# get the PDF link
found_link= nil
agent.get("http://elwatan.com/").parser.xpath('//div[2]/div/p/a/@href|/img').each do |link|
if link.text.strip = link_text
found_link = link['href']
end
end
pdf =
if found_link
# fetch pdf
agent.get(found_link)
end
end
private
def get_agent
agent = Mechanize.new
agent.user_agent_alias = 'Mac Safari'
page = agent.get("http://elwatan.com/sso/inscription/inscription_payant.php")
# login
form = page.form_with(:id => 'form-login-page')
form.login = "my_login"
form.password = "my_password"
form.submit
agent
end
end
Ok, and now you can write in your controller
class PdfsController < ApplicationController
def index
@watan = WatanScraper.get_all_pdfs
end
def show
pdf_name = params[:id]
@pdf = WatanScraper.get_pdf(pdf_name)
send_data @pdf, :filename => "#{padf_name}.pdf"
end
end
Your view should be in file views/pdfs/index.html.haml
(let's use haml
- @watan.each do |link_text|
= link_to "Download #{link_text}", pdf_path(link_text)
Your routes should be as follows (config/routes.rb
)
resources :pdfs, only: [:index, :show]
This code is of course untested, but this at least is nicely structured and will fetch the pdf in the right session (using mechanize) and then sends it back to the browser.