Perl Question with UserAgent Get Website on Loop

I'm able to grab the first image fine, but then the content seems to be looping inside itself. Not sure what I'm doing wrong.

#!/usr/bin/perl
use LWP::Simple;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
for(my $id=1;$id<55;$id++)
{
    my $response = $ua->get("http://www.gamereplays.org/community/index.php?act=medals&CODE=showmedal&MDSID=" . $id );
    my $content = $response->content;    
        for(my $id2=1;$id2<10;$id2++)
        {
                $content =~ /<img src="http:\/\/www\.gamereplays.org\/community\/style_medals\/(.*)$id2\.gif" alt=""\/>/;
                $url = "http://www.gamereplays.org/community/style_medals/" . $1 . $id2 . ".gif";
  print "--\n\r";
  print "ID: ".$id."\n\r";
  print "ID2: ".$id2."\n\r";
  print "URL: ".$url."\n\r";
  print "1: ".$1."\n\r";
  print "--\n\r";
  getstore($url, $1 . $id2 . ".gif");
        }
}

Solution

As others have stated, this is really a job for an HTML::Parser. Also, you should 'use strict;' and remove use LWP::Simple as you're not using the library.

You could change your regex to the following:

$content =~ m{http://www\.gamereplays\.org/community/style_medals/([\w\_]+)$id2\.gif}s;

But you won't get style_medals/comp_graphics_10.gif - which may be what you want. I think something like the following would work better. My apologies for the style changes but I can't resist modifying for PBP.

#!/usr/bin/perl                                                                 

use LWP::UserAgent;
use Carp;
use strict;

my $ua = LWP::UserAgent->new();

# Fetch pages from 1 to 55.  Are we sure we won't have page 56?                 
# Perhaps consider running until a 404 is found.                                
for (my $id = 1; $id < 55; $id++) {

    # Get the page data                                                         
    my $response = $ua->get( 'http://www.gamereplays.org/community/index.php?ac\
t=medals&CODE=showmedal&MDSID='.$id );

    # Check for failure and abort                                               
    if (!defined $response || !$response->is_success) {
        croak 'Request failed! '.$response->status_line();
    }

    my $content = $response->content();

    # Run this loop each time we find the url                                   
  CONTENT_LOOP:
    while ($content =~ s{<img src="(http://www\.gamereplays\.org/community/styl\
e_medals/([^\"]+))" }{}ms) {

        my $url   = $1;  # The entire url, no need to recreate the domain       
        my $file  = $2;  # Just the file name portion                           
        my ($id2) = $file =~ m{ _(\d+)\.gif \Z}xms; # extract id2 for debug     

        next CONTENT_LOOP if !defined $id2;         # Handle SOTW.gif file(s)   

        # Display stats about each id found                                     
        print "--\n";
        print "ID:  $id\n";
        print "ID2: $id2\n";
        print "URL: $url\n";
        print "1:   $file\n";
        print "--\n";

        # You might want to consider involving the $id in the filename as       
        # you could have the same filename on multiple pages                    
        getstore( $url, $file);
    }
}