LWP::Simple runs very fine: how to store 6000 ++ records in a file and do some cleanup?

good evening dear community!

i want to process multiple webpages, kind of like a web spider/crawler might. I have some bits - but now i need to have some improved spider-logic. See the target-url http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50

Update:

thanks to two great comments i have gained alot. Now the code runs very nice. Last quesstion: How to store the data into a file... How to force the parser to write the results into a file. This is much more convenient than getting more than 6000 records in the command line... And if the outputs is done in a file i need to do some final cleanup: see the output: If we compare all the output with the target url - then sure this needs some cleanup, what do you think?! Again see the target-url http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50

6114,7754,"Volksschule Zeil a.Mai",/Sa,"d a.Mai",(Gru,"09524/94992 09524/94997",,Volksschulen,
6115,7757,"Mittelschule Zeil - Sa","d a.Mai",Schulri,"g 
97475       Zeil","09524/94995
09524/94997",,Volksschulen,"      www.hauptschule-zeil-sand.de"
6116,3890,"Volksschule Zeilar",(Gru,"dschule)
Bgm.-Stallbauer-Str. 8
84367       Zeilar",,"08572/439
08572/920001",,Volksschulen,"      www.gs-zeilarn.de"
6117,4664,"Volksschule Zeitlar",(Gru,"dschule)
Schulstra�e 5
93197       Zeitlar",,"0941/63528
0941/68945",,Volksschulen,"      www.vs-zeitlarn.de"
6118,4818,"Mittelschule Zeitlar","Schulstra�e 5
93197       Zeitlar",,,"0941/63528
0941/68945",,Volksschulen,"      www.vs-zeitlarn.de"
6119,7684,"Volksschule Zeitlofs (Gru","dschule)
Raiffeise","Str. 36
97799       Zeitlofs",,"09746/347
09746/347",,Volksschulen,"      grundschule-zeitlofs.de"

thx for any and all infos! zero!

Here the old question: Seems to work fine as a part of a 1-shot function. But as soon as I include the function as part of a loop, it doesn't return anything...Whats the deal?

To begin with the beginning: see the target http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50 This page has got more than 6000 results! Well how do i get all the results? I use the module LWP::simple and i need to have some improved arguments that i can use in order to get all the 6150 records... i have a code that steems from the very supportive member tadmic (see this forum) - and that basically runs very nice. But after adding some lines - (at the moment) it spits out some errors.

Attempt: Here are the first 5 page URLs:

http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=0 
http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=50 
http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=100 
http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=150 
http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=200

We can see that the "s" attribute in the URL starts at 0 for page 1, then increases by 50 for each page there after. We can use this information to create a loop:

#!/usr/bin/perl  
use warnings;  
use strict;  
use LWP::Simple;  
use HTML::TableExtract;  
use Text::CSV;  

my @cols = qw(  
    rownum  
    number  
    name  
    phone  
    type  
    website  
);  

my @fields = qw(  
    rownum  
    number  
    name  
    street  
    postal  
    town  
    phone  
    fax  
    type  
    website  
);  

my $i_first = "0";   
my $i_last = "6100";   
my $i_interval = "50";   

for (my $i = $i_first; $i <= $i_last; $i += $i_interval) {   
    my $html = get("http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=50&s=$i");   
    $html =~ tr/r//d;     # strip the carriage returns  
    $html =~ s/&nbsp;/ /g; # expand the spaces  

    my $te = new HTML::TableExtract();  
    $te->parse($html);  

    my $csv = Text::CSV->new({ binary => 1 });  

    foreach my $ts ($te->table_states) {  
        foreach my $row ($ts->rows) {  
            #trim leading/trailing whitespace from base fields  
            s/^s+//, s/\s+$// for @$row;  

            #load the fields into the hash using a "hash slice"  
            my %h;  
            @h{@cols} = @$row;  

            #derive some fields from base fields, again using a hash slice  
            @h{qw/name street postal town/} = split /n+/, $h{name};  
            @h{qw/phone fax/} = split /n+/, $h{phone};  

            #trim leading/trailing whitespace from derived fields  
            s/^s+//, s/\s+$// for @h{qw/name street postal town/};  

            $csv->combine(@h{@fields});  
            print $csv->string, "\n";  
        }  
    } 
}

i tested the code and get the following results:

btw: here the lines 57 and 58: ...the command line tells that ihave errors here..:

    #trim leading/trailing whitespace from derived fields  
        s/^s+//, s/\s+$// for @h{qw/name street postal town/};

what do you think? Are there some backslashes missing!? How to fix and testrun the code so that the results are correct!?

Look forward to hear from you zero

see the errors that i get:

    Ot",,,Telefo,Fax,Schulat,Webseite                                                          Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58.                                                                                        Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58.                                                                                        Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58.                                                                                        Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58.                                                                                        "lfd. N.",Schul-numme,Schul,"ame                                                                           
    Sta�e
    PLZ 
    Ot",,,Telefo,Fax,Schulat,Webseite
Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58.
Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58.
Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58.
Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58.
"lfd. N.",Schul-numme,Schul,"ame
    Sta�e
    PLZ 
    Ot",,,Telefo,Fax,Schulat,Webseite
Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58.
Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58.
Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58.
Use of uninitialized value $_ in substitution (s///) at bavaria_all_guru.pl line 58.
"lfd. N.",Schul-numme,Schul,"ame

Solution

If you're trying to extract links from the pages, use WWW::Mechanize, which is a wrapper around LWP and properly parses the HTML to get the links for you, as well as a zillion other convenience things for people scraping web pages.