Search code examples
perlrobots.txtlwp

Howto specify own robots.txt rules for LWP::RobotUA


I wrote a script to check my own websites with LWP::RobotUA. I would like to avoid the frequent requests for my robots.txt.

The rules parameter for LWP::RobotUA should allow me to specify those, but I don't qiute understand what should be passed for "allow all pages".

my $ua = LWP::RobotUA->new(agent=>'my-robot/0.1', from=>'me@foo.com', rules=> ??? );

Solution

  • After more research, I think the intended way to supply robots rules is by subclassing WWW::RobotRules.

    {
        package WWW::NoRules;
        use vars qw(@ISA);
        use WWW::RobotRules;
        @ISA = qw(WWW::RobotRules::InCore);
    
        sub allowed {
            return 1;
        }
    }
    
    my $ua = LWP::RobotUA->new(agent=>'my-robot/0.1', from=>'me@foo.com', rules=>WWW::NoRules->new);