I'm trying to detect if a link is broken or not, as in if it's a web address I could paste into my browser and find a web page. I've tried two methods so far that I found online and both are giving me false positives (LWP::UserAgent and LWP::Simple).
#!/usr/bin/perl -w
use strict;
use LWP::UserAgent;
my $url1 = 'http://www.gutenberg.org';
my $url2 = 'http://www.gooasdfzzzle.com.no/thisisnotarealsite';
my $ua = LWP::UserAgent->new;
$ua->agent("Mozilla/8.0"); # Pretend to be Mozilla
my $req = HTTP::Request->new(GET => "$url1");
my $res = $ua->request($req);
if ($res->is_success) {
print "Success!\n";
} else {
print "Error: " . $res->status_line . "\n";
}
$req = HTTP::Request->new(GET => "$url2");
$res = $ua->request($req);
if ($res->is_success) {
print "Success!\n";
} else {
print "Error: " . $res->status_line . "\n";
}
Which is giving me output of:
Success!
Success!
and then there's
#!/usr/bin/perl -w
use strict;
use LWP::Simple;
my $url1 = 'http://www.gutenberg.org';
my $url2 = 'http://www.gooasdfzzzle.com.no/thisisnotarealsite';
if (head("$url1")) {
print "Yes\n";
} else {
print "No\n";
}
if (head("$url2")) {
print "Yes\n";
} else {
print "No\n";
}
Which is giving me an output of:
Yes
Yes
Am I missing something here?
Your code worked fine for me, I can only see a problem if your running behind a VPN or gateway as previous stated. Always use strict
and warnings
, and here is an alternative way so you are not initializing a new Request object everytime you want to check for a valid link.
use strict;
use warnings;
use LWP::UserAgent;
sub check_url {
my ($url) = @_;
my $ua = LWP::UserAgent->new;
my $req = HTTP::Request->new(HEAD => $url);
my $res = $ua->request($req);
return $res->status_line if $res->is_error;
return "Success: $url";
}