Search code examples
perllwpbroken-links

Detect a broken link (web) in perl


I'm trying to detect if a link is broken or not, as in if it's a web address I could paste into my browser and find a web page. I've tried two methods so far that I found online and both are giving me false positives (LWP::UserAgent and LWP::Simple).

#!/usr/bin/perl -w

use strict;
use LWP::UserAgent;

my $url1 = 'http://www.gutenberg.org';
my $url2 = 'http://www.gooasdfzzzle.com.no/thisisnotarealsite';


my $ua = LWP::UserAgent->new;
$ua->agent("Mozilla/8.0");  # Pretend to be Mozilla

my $req = HTTP::Request->new(GET => "$url1");
my $res = $ua->request($req);

if ($res->is_success) {
    print "Success!\n";
} else {
    print "Error: " . $res->status_line . "\n";
}

$req = HTTP::Request->new(GET => "$url2");
$res = $ua->request($req);

if ($res->is_success) {
    print "Success!\n";
} else {
    print "Error: " . $res->status_line . "\n";
}

Which is giving me output of:

Success!
Success!

and then there's

#!/usr/bin/perl -w

use strict;
use LWP::Simple;

my $url1 = 'http://www.gutenberg.org';
my $url2 = 'http://www.gooasdfzzzle.com.no/thisisnotarealsite';

if (head("$url1")) {
    print "Yes\n";
} else {
    print "No\n";
}

if (head("$url2")) {
    print "Yes\n";
} else {
    print "No\n";
}

Which is giving me an output of:

Yes
Yes

Am I missing something here?


Solution

  • Your code worked fine for me, I can only see a problem if your running behind a VPN or gateway as previous stated. Always use strict and warnings, and here is an alternative way so you are not initializing a new Request object everytime you want to check for a valid link.

    use strict;
    use warnings; 
    use LWP::UserAgent; 
    
    sub check_url { 
      my ($url) = @_; 
      my $ua = LWP::UserAgent->new; 
      my $req = HTTP::Request->new(HEAD => $url);
      my $res = $ua->request($req); 
      return $res->status_line if $res->is_error;
      return "Success: $url"; 
    }