Search code examples
pythonperlweb-scrapingblockhttp-status-code-403

Web Server Block libwww-perl requests


I have web scraping script (Perl language) in the first time it works perfectly but after ~ 3500 ( GET request ) server return 403 error ( Forbidden | not ip banned ) but when use the same script in ( python language ) i find the same problem work but after ~ 3500 requests i get 403 ( retrun to work after 24 heures ) i don't know what is the problem and how i can fix it

i read about libwww-perl :

https://cloudkul.com/blog/block-libwww-perl-attack-in-apache-web-server/


Solution

  • Use agent method provided by LWP::UserAgent to change "user agent identification string".

    It should solve blocking based on client identification string.
    It will not solve blocking based on abusive behavior.

    perldoc LWP::UserAgent

    agent

      my $agent = $ua->agent;
      $ua->agent('Checkbot/0.4 ');    # append the default to the end
      $ua->agent('Mozilla/5.0');
      $ua->agent("");                 # don't identify
    

    Get/set the product token that is used to identify the user agent on the network. The agent value is sent as the User-Agent header in the requests.

    The default is a string of the form libwww-perl/#.###, where #.### is substituted with the version number of this library.

    If the provided string ends with space, the default libwww-perl/#.### string is appended to it.

    The user agent string should be one or more simple product identifiers with an optional version number separated by the / character.