I am trying to port some old web scraping scripts written using older Perl modules to work using only Mojolicious.
Have written a few basic scripts with Mojo but am puzzled on an authenticated login which uses a secure login site and how this should be handled with a Mojo::UserAgent
script. Unfortunately the only example I can see in the documentation is for basic authentication without forms.
The Perl script I am trying to convert to work with Mojo:UserAgent is as follows:
#!/usr/bin/perl
use LWP;
use LWP::Simple;
use LWP::Debug qw(+);
use LWP::Protocol::https;
use WWW::Mechanize;
use HTTP::Cookies;
# login first before navigating to pages
# Create our automated browser and set up to handle cookies
my $agent = WWW::Mechanize->new();
$agent->cookie_jar(HTTP::Cookies->new());
$agent->agent_alias( 'Windows IE 6' ); #tell the website who we are (old!)
# get login page
$agent->get("https://reg.mysite.com")
$agent->success or die $agent->response->status_line;
# complete the user name and password form
$agent->form_number (1);
$agent->field (username => "user1");
$agent->field (password => "pass1");
$agent->click();
#try to get member's only content page from main site on basis we are now "logged in"
$agent->get("http://www.mysite.com/memberpagesonly1");
$agent->success or die $agent->response->status_line;
$member_page = $agent->content();
print "$member_page\n";
So the above works fine. How to convert to do the same job in Mojolicious?
Mojolicious is a web application framework. While Mojo::UserAgent
works well as a low-level HTTP user agent, and provides facilities that are unavailble from LWP
(in particular native support for asynchronous requests and IPV6) neither are as convenient to use as as WWW::Mechanize
for web scraping.
WWW::Mechanize
subclasses LWP::UserAgent
to interface with the internet, and uses HTML::Form
to process the forms it finds. Mojo::UserAgent
has no facility for processing HTML forms, and so building the corresponding HTTP requests is not at all straighforward. Information such as the HTTP method used (GET
or POST
) the names of the form fields, and the insertion of default values for hidden fields are all done automatically by HTML::Form
and are left to the programmer if you restrict yourself to Mojo::UserAgent
.
It seems to me that even trying to use Mojo::UserAgent
in combination with HTML::Form
is poblematic, as the former requires a Mojo::Transaction::HTTP
object to represent the submission of a filled-in form, whereas the latter generates HTTP::Request
objects for use with LWP
.
In short, unless you are willing to largely rewrite WWW::Mechanize
, I think there is no way to reimplement your software using Mojolicious
modules.