Search code examples
perllwplwp-useragent

Perl Question with UserAgent Get Website on Loop


I'm able to grab the first image fine, but then the content seems to be looping inside itself. Not sure what I'm doing wrong.

#!/usr/bin/perl
use LWP::Simple;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
for(my $id=1;$id<55;$id++)
{
    my $response = $ua->get("http://www.gamereplays.org/community/index.php?act=medals&CODE=showmedal&MDSID=" . $id );
    my $content = $response->content;    
        for(my $id2=1;$id2<10;$id2++)
        {
                $content =~ /<img src="http:\/\/www\.gamereplays.org\/community\/style_medals\/(.*)$id2\.gif" alt=""\/>/;
                $url = "http://www.gamereplays.org/community/style_medals/" . $1 . $id2 . ".gif";
  print "--\n\r";
  print "ID: ".$id."\n\r";
  print "ID2: ".$id2."\n\r";
  print "URL: ".$url."\n\r";
  print "1: ".$1."\n\r";
  print "--\n\r";
  getstore($url, $1 . $id2 . ".gif");
        }
}

Solution

  • As others have stated, this is really a job for an HTML::Parser. Also, you should 'use strict;' and remove use LWP::Simple as you're not using the library.

    You could change your regex to the following:

    $content =~ m{http://www\.gamereplays\.org/community/style_medals/([\w\_]+)$id2\.gif}s;
    

    But you won't get style_medals/comp_graphics_10.gif - which may be what you want. I think something like the following would work better. My apologies for the style changes but I can't resist modifying for PBP.

    #!/usr/bin/perl                                                                 
    
    use LWP::UserAgent;
    use Carp;
    use strict;
    
    my $ua = LWP::UserAgent->new();
    
    # Fetch pages from 1 to 55.  Are we sure we won't have page 56?                 
    # Perhaps consider running until a 404 is found.                                
    for (my $id = 1; $id < 55; $id++) {
    
        # Get the page data                                                         
        my $response = $ua->get( 'http://www.gamereplays.org/community/index.php?ac\
    t=medals&CODE=showmedal&MDSID='.$id );
    
        # Check for failure and abort                                               
        if (!defined $response || !$response->is_success) {
            croak 'Request failed! '.$response->status_line();
        }
    
        my $content = $response->content();
    
        # Run this loop each time we find the url                                   
      CONTENT_LOOP:
        while ($content =~ s{<img src="(http://www\.gamereplays\.org/community/styl\
    e_medals/([^\"]+))" }{}ms) {
    
            my $url   = $1;  # The entire url, no need to recreate the domain       
            my $file  = $2;  # Just the file name portion                           
            my ($id2) = $file =~ m{ _(\d+)\.gif \Z}xms; # extract id2 for debug     
    
            next CONTENT_LOOP if !defined $id2;         # Handle SOTW.gif file(s)   
    
            # Display stats about each id found                                     
            print "--\n";
            print "ID:  $id\n";
            print "ID2: $id2\n";
            print "URL: $url\n";
            print "1:   $file\n";
            print "--\n";
    
            # You might want to consider involving the $id in the filename as       
            # you could have the same filename on multiple pages                    
            getstore( $url, $file);
        }
    }