I have a web scraping application, written in OO Perl. There's single WWW::Mechanize object used in the app. How can I make it to not fetch the same URL twice, i.e. make the second get()
with the same URL no-op:
my $mech = WWW::Mechanize->new();
my $url = 'http:://google.com';
$mech->get( $url ); # first time, fetch
$mech->get( $url ); # same url, do nothing
You can subclass WWW::Mechanize
and redefine the get()
method to do what you want:
package MyMech;
use base 'WWW::Mechanize';
sub get {
my $self = shift;
my($url) = @_;
if (defined $self->res && $self->res->request->uri ne $url) {
return $self->SUPER::get(@_)
}
return $self->res;
}