Search code examples
perlwww-mechanize

How can I get links that match a regex using WWW::Mechanize?


I'm trying to use regular expressions to catch a link, but can not. I have all the links, but there are many links that do not want.

What I do is to grab all links: http://valeptr.com/scripts/runner.php?IM= To comply with this pattern.

I put the script I'm doing:

use warnings;
use strict;
use WWW::Mechanize;
use WWW::Mechanize::Sleepy;

my $Explorador =

    WWW::Mechanize->new(

       agent =>
             'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624',

       sleep => '5..20'
    );

#Proceed to access the URL to find all the links in emails
$Explorador->get("file:/home/alejandro/Escritorio/hehe.php.html");

#If you want debug DOM Document.
#print $Explorador->content();

my @links = $Explorador->links;

foreach my $link (@links) {

   # Retrieve the link URL like:
   # http://valeptr.com/scripts/runner.php?IM=0cdb7d48110375.
   my $href = $link->url;

   foreach my $s ($href) { #Aqui la expresión regular

       my @links = $s =~ qr{
                               (
                               [^B]*
                               )
                               $
                           }x;
       foreach (@links) {
           print "\n",$_;
       }
   }
} 

PS: I guess this regular expression will be more than seen but not seen. If so am coming back to put a post with the same.

Problem: There is a heap of links and I need cojer the links that expire with the boss: Http: // valeptr.com/scripts/runner.php?IM= For it in the line 19 I have to apply an expression regulate. This variable my @links=$Explorador->links; he returns all the links that exist. But I want cojer only the link that I have put above. Sincerely,


Solution

  • Why not get WWW::Mechanize to do the work for you, especially when it can filter out the links for you via a supplied regex?

    my @wanted_links = $Explorador->find_all_links ( 
                                         url_regex => qr{scripts/runner\.php\?IM=}
                                    );
    

    No for loops!