Search code examples
regexperlcloudflarelwplwp-useragent

Waiting for CloudFlare DDOS Protection LWP Perl


Edit: Ended up using WWW::Mechanize::Firefox. I answered my own question below.

I am trying to access a website and download it's page. The cloudflare DDOS protection on the site will occasionally come on, and I can't make LWP go through it. I can successfully detect that a page is a cloudflare splash page with the regex /Ray ID: [a-f0-9]*/ but whenever I attempt to connect again I just get the same splash screen with a new Ray ID. Here is a (condensed) code sample:

use LWP::UserAgent;
$ua = LWP::UserAgent->new;
$ua->agent('Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 Iceweasel/31.3.0');
$signin_url = 'my url';
$signin_page = $ua->get($signin_url);
if($signin_page->content =~ /Ray ID: ([a-f0-9]*)/i) {
     print "DDOS protection page here\n";
     #more code to retry, but just gets back into this part of the IF
 } else {
     print "Not the DDOS page\n";
     #now I would save to file
}

Since that doesn't work I need to be able to do it another way.


Solution

  • Cloudflare needed Javascript, so I used WWW::Mechanize::Firefox. Like this

    #!/usr/bin/perl
    use WWW::Mechanize::Firefox;
    system('firefox &'); #The & makes it so the program doesn't wait.
    sleep 5; #So firefox can load
    $mech = WWW::Mechanize::Firefox->new;
    $mech->get('http://www.mycloudflareblockedwebsite.com/');
    if($mech->content =~ /Test if it is the CLOUDFLARE source HTML page/) {
          sleep 10; #Wait for cloudflare to do it's thing
    }
    

    Taa daa!

    It took me a long time to do that.