Search code examples
perlproxyweb-scrapingwww-mechanize

Perl - WWW::Mechanize not working with proxy


Machine: Windows 7 Professional 64 bit. Portable Perl (Strawberry Perl (64-bit) 5.22.0.1).
Proxy settings in Internet Explorer.
- Automatically detect settings
- use automatic configuration script
- Address : http://url:portno/proxy.pac

With below code, I am not able to get proper response.

use strict;
use warnings;
use WWW::Mechanize;
use LWP::UserAgent;
use LWP::Protocol::https;

print LWP::UserAgent->VERSION, "\n";
print LWP::Protocol::https->VERSION, "\n";

#$ENV{HTTPS_PROXY} = 'http://url:portno/proxy.pac';
#$objMech->get("http://www.url.html");

my $objMech = WWW::Mechanize->new(autocheck => 0 );
$objMech->proxy(['https', 'http', 'ftp'], 'http://url:portno/proxy.pac');
$objMech->get("http://www.url.com");
print $objMech->content();

my @links = $objMech->links();
for my $link (@links) {
    printf $link->text, $link->url;
}

It's output is as follows:

6.13
6.06
<HTML>
<Head>
<TITLE>400 Bad Request
</TITLE>
</HEAD>
<BODY bgcolor="#FFFFFF"><h1>
400 Bad Request
</h1>
</BODY>
</HTML>

Solution

  • $objMech->proxy(['https', 'http', 'ftp'], 'http://url:portno/proxy.pac');
    

    You have to give the URL of the proxy itself (i.e. http://ip:port) and not the URL were a proxy configuration script is located (i.e. the PAC file). These PAC files are Javascript code which returns the appropriate proxy URL based on the target URL. LWP/WWW::Mechanize does not support Javascript and can not deal with such proxy configuration files by itself.