Search code examples
regexperlfirefoxfirefox-addonwww-mechanize

WWW::Mechanize::Firefox - allmost there - only a little regex error left


Well to me Perl sometimes looks abit Abracadabra so many thanks for the patience with me...

update; there were some errors untill user1269651 and Bodoin offered agreat fix

see the results of bodoins code..(note he has changed the code one time - i used here the first version ever...:;

linux-wyee:/home/martin/perl # perl test_7.pl
http://www.unifr.ch/sfm
http://www.zug.phz.ch
http://www.schwyz.phz.ch
http://www.luzern.phz.ch
http://www.schwyz.phz.ch                                                                   http://www.phvs.ch                                                                         http://www.phtg.ch                                                                         http://www.phsg.ch                                                                         http://www.phsh.ch                                                                         Use of uninitialized value $png in print at test_7.pl line 25, <$urls> line 10.                                                                                        http://www.phr.ch                                                                          http://www.hepfr.ch/
http://www.phbern.ch
http://www.ph-solothurn.ch
http://www.pfh-gr.ch
Got status code 500 at test_7.pl line 14
linux-wyee:/home/martin/perl # 

and the latest version of bodins code some results are looking like that..

Can't call method "addProgressListener" on an undefined value at /usr/lib/perl5/site_perl/5.14.2/WWW/Mechanize/Firefox.pm line 566, <$urls> line 12.

well some minor things left - see above... what can we do with those little errors.. btw: what about the idea of storing the results in a folder... /(called images or so!?)

end of update...

here the inital thread starts - and gives an outline of what is wanted:

i need to have some thumbnails from websites but i tried to use wget - but that does not work for me, since i need some rendering functions what is needet: i have a list of 2,500 URLs, one on each line, saved in a file. Then i want a script - see it below - to open the file, read a line, then retrieve the website and save the image as a small thumbnail.

well since i have a bunch of web-sites (2500) i have to make up my mind about the naming of the results.

http://www.unifr.ch/sfm
http://www.zug.phz.ch
http://www.schwyz.phz.ch
http://www.luzern.phz.ch
http://www.schwyz.phz.ch
http://www.phvs.ch
http://www.phtg.ch
http://www.phsg.ch
http://www.phsh.ch
http://www.phr.ch
http://www.hepfr.ch/
http://www.phbern.ch

So far so good, well i think i try something like this

We also have to close a filehandler if we do not need it anymore. Besides this we can use 'or die' on open. i did it - see below!

Btw we need a good file name. Since i have a huge list of urls then i get a huge list of output files. Therefore i need to have good file names. Can we reflect those things and needs in the programme!?

the script does not start at all ....

#!/usr/bin/perl

use strict;
use warnings;
use WWW::Mechanize::Firefox;

my $mech = new WWW::Mechanize::Firefox();

open(INPUT, "<urls.txt") or die $!;

while (<INPUT>) {
        chomp;
        next if $_ =~ m/http/i;
        print "$_\n";
        $mech->get($_);
        my $png = $mech->content_as_png();
        my $name = "$_";
        $name =~s#http://##is;
        $name =~s#/##gis;$name =~s#\s+\z##is;$name =~s#\A\s+##is;
        $name =~s/^www\.//;
        $name .= ".png";
        open(my $out, ">",$name) or die $!;
        binmode($out);
        print $out $png;
        close($out);
        sleep (5);
}

Solution

  • I came up with this:

    while (my $name = <DATA>) {
            chomp ($name) ;
    
            #$mech->get($_);
            #my $png = $mech->content_as_png();
            $name =~ s#http://##;  #REMOVE THIS LINE
    
            $name =~s#/#-#gis;
            $name =~s#\s+\z##is;$name =~s#\A\s+##is;
    
            $name =~s/^www\.//;
    
            $name .= ".png";
    
            print $name . "\n\n";   #REMOVE THIS LINE       
            #open(my $out, ">",$name) or die $!;
            #binmode($out);
            #print $out $png;
            #close($out);
            #sleep (5);
    }
    
    
    __DATA__
    http://www.unifr.ch/sfm
    http://www.zug.phz.ch
    http://www.schwyz.phz.ch
    http://www.luzern.phz.ch
    http://www.schwyz.phz.ch
    http://www.phvs.ch
    http://www.phtg.ch
    http://www.phsg.ch
    http://www.phsh.ch
    http://www.phr.ch
    http://www.hepfr.ch/
    http://www.phbern.ch
    

    You should be able to modify it for your needs, I commented out all but the regex stuff. I also changed one regec to replace a '/' with a '-' so that there is less probability of falsly generating duplicate URL's.

    So that http://www.unifr.ch/sfm will look like this: unifr.ch-sfm

    Hope this helps