I am using WWW::Mechanize to crawl sites, and it works great except for sometimes it will hit a page that returns error code 404 or 500 (not found or internal server error), and then my script will just exit and stop running. This is really messing with my data collection, so is there anyway that WWW::Mechanize will let me catch these errors and see what kind of error code was returned (i.e. 404,500, etc.). Thanks for the help!
You need to disable autocheck:
my $mech = WWW::Mechanize->new( autocheck => 0 );
$mech->get("http://somedomain.com");
if ( $mech->success() ) {
...
}
else {
print "status is: " . $mech->status;
}
Also, as an aside, have a look at WWW::Mechanize::Cached::GZip and WWW::Mechanize::Cached to speed up your development when testing your mech scripts.