As the title says WWW::Mechanize does not recognize
<base href="" />
if page content iz gzipped. Here is an example:
use strict;
use warnings;
use WWW::Mechanize;
my $url = 'http://objectmix.com/perl/356181-help-lwp-log-after-redirect.html';
my $mech = WWW::Mechanize->new;
$mech->get($url);
print $mech->base()."\n";
# force plain text instead of gzipped content
$mech->get($url, 'Accept-Encoding' => 'identity');
print $mech->base()."\n";
Output:
http://objectmix.com/perl/356181-help-lwp-log-after-redirect.html
http://objectmix.com/ <--- this is correct !
Am I missing something here? Thanks
Edit: I just tested it directly with LWP::UserAgent and it works without any problems:
use LWP::UserAgent;
my $ua = LWP::UserAgent->new();
my $res = $ua->get('http://objectmix.com/perl/356181-help-lwp-log-after-redirect.html');
print $res->base()."\n";
Output:
http://objectmix.com/
This looks like WWW::Mechanize bug?
Edit 2: It is LWP or HTTP::Response bug, not WWW::Mechanize. LWP does not request gzip by default. If I set
$ua->default_header('Accept-Encoding' => 'gzip'),
in the above example it returns wrong base
Edit 3: Bug is in LWP/UserAgent.pm in parse_head()
It calls HTML/HeadParser with gzipped HTML and HeadParser has no idea what to do with it. LWP should gunzip the content before calling parsing subroutine.
There is bug report about this: https://rt.cpan.org/Public/Bug/Display.html?id=54361
Conclusion: LWP is missing this "feature".
WWW::Mechanize:
This could eventually be solved by overloading _make_request() in WWW::Mechanize with your own pkg and re-seting HTTP::Response by decoded_content or even dirtier by overwriting $mech->{base} with the parse base from content.