Search code examples
perlwww-mechanize

How can I make WWW:Mechanize to not fetch pages twice?


I have a web scraping application, written in OO Perl. There's single WWW::Mechanize object used in the app. How can I make it to not fetch the same URL twice, i.e. make the second get() with the same URL no-op:

my $mech = WWW::Mechanize->new();
my $url = 'http:://google.com';

$mech->get( $url ); # first time, fetch
$mech->get( $url ); # same url, do nothing

Solution

  • You can subclass WWW::Mechanize and redefine the get() method to do what you want:

    package MyMech;
    use base 'WWW::Mechanize';
    
    sub get {
        my $self = shift;
        my($url) = @_;
    
        if (defined $self->res && $self->res->request->uri ne $url) {
            return $self->SUPER::get(@_)
        }
        return $self->res;
    }