Search code examples
pythonseleniumselenium-chromedrivercloudflaregoogle-chrome-headless

What is the difference in accessing Cloudflare website using ChromeDriver/Chrome in normal/headless mode through Selenium Python


I have a question about --headless mode in Python Selenium for Chrome.

Code

 from selenium import webdriver
 from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

 CHROME_DRIVER_DIR = "selenium/chromedriver"

 chrome_options = webdriver.ChromeOptions()
 caps = DesiredCapabilities().CHROME
 chrome_options.add_argument("--disable-dev-shm-usage")
 chrome_options.add_argument("--remote-debugging-port=9222")
 chrome_options.add_argument("--headless")  # Runs Chrome in headless mode.
 chrome_options.add_argument('--no-sandbox')  # # Bypass OS security model
 chrome_options.add_argument("--disable-extensions")
 chrome_options.add_argument("--disable-gpu")

 browser = webdriver.Chrome(desired_capabilities=caps, executable_path=CHROME_DRIVER_DIR, options=chrome_options)

 browser.get("https://www.manta.com/c/mm2956g/mashuda-contractors")
 print(browser.page_source)
 browser.quit()

When I'm remove chrome_options.add_argument("--headless") all working good, but with this --headless* got next issue

Please enable cookies.

Error 1020 Ray ID: 53fd62b4087d8116 • 2019-12-04 11:19:28 UTC

Access denied

What happened?
This website is using a security service to protect itself from online attacks.

Cloudflare Ray ID: 53fd62b4087d8116 • Your IP: 168.81.117.111 • Performance & security by Cloudflare

What is the difference for normal mode and --headless?


Solution

  • I took your code, removed the optional arguments and added a few arguments to execute the test as follows:

    • Code Block:

      from selenium import webdriver
      from selenium.webdriver.common.by import By
      from selenium.webdriver.support.ui import WebDriverWait
      from selenium.webdriver.support import expected_conditions as EC
      
      options = webdriver.ChromeOptions() 
      options.add_argument("start-maximized")
      options.add_argument("--headless")
      options.add_experimental_option("excludeSwitches", ["enable-automation"])
      options.add_experimental_option('useAutomationExtension', False)
      driver = webdriver.Chrome(options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
      driver.get("https://www.manta.com/c/mm2956g/mashuda-contractors")
      print(driver.page_source)
      driver.quit()
      
    • Console Output:

      <html class="js" lang="en-US" style="opacity: 1; visibility: visible;"><!--<![endif]--><head>
      <title>Access denied | www.manta.com used Cloudflare to restrict access</title>
      <meta charset="UTF-8">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1">
      <meta name="robots" content="noindex, nofollow">
      <meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1">
      <link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" type="text/css" media="screen,projection">
      <!--[if lt IE 9]><link rel="stylesheet" id='cf_styles-ie-css' href="/cdn-cgi/styles/cf.errors.ie.css" type="text/css" media="screen,projection" /><![endif]-->
      <style type="text/css">body{margin:0;padding:0}</style>
      
      
      <!--[if gte IE 10]><!--><script type="text/javascript" src="/cdn-cgi/scripts/zepto.min.js"></script><!--<![endif]-->
      <!--[if gte IE 10]><!--><script type="text/javascript" src="/cdn-cgi/scripts/cf.common.js"></script><!--<![endif]-->
      
      
      
      </head>
      <body>
        <div id="cf-wrapper">
          <div class="cf-alert cf-alert-error cf-cookie-error" id="cookie-alert" data-translate="enable_cookies">Please enable cookies.</div>
          <div id="cf-error-details" class="cf-error-details-wrapper">
            <div class="cf-wrapper cf-header cf-error-overview">
          <h1>
            <span class="cf-error-type" data-translate="error">Error</span>
            <span class="cf-error-code">1020</span>
            <small class="heading-ray-id">Ray ID: 53fd7c2fca12d5fc • 2019-12-04 11:36:52 UTC</small>
          </h1>
          <h2 class="cf-subheadline">Access denied</h2>
            </div><!-- /.header -->
      
            <section></section><!-- spacer -->
      
            <div class="cf-section cf-wrapper">
          <div class="cf-columns two">
            <div class="cf-column">
              <h2 data-translate="what_happened">What happened?</h2>
              <p>This website is using a security service to protect itself from online attacks.</p>
            </div>
      
      
          </div>
            </div><!-- /.section -->
      
            <div class="cf-error-footer cf-wrapper">
        <p>
          <span class="cf-footer-item">Cloudflare Ray ID: <strong>53fd7c2fca12d5fc</strong></span>
          <span class="cf-footer-separator">•</span>
          <span class="cf-footer-item"><span>Your IP</span>: 123.201.54.43</span>
          <span class="cf-footer-separator">•</span>
          <span class="cf-footer-item"><span>Performance &amp; security by</span> <a href="https://www.cloudflare.com/5xx-error-landing?utm_source=error_footer" id="brand_link" target="_blank">Cloudflare</a></span>
      
        </p>
      </div><!-- /.error-footer -->
      
      
          </div><!-- /#cf-error-details -->
        </div><!-- /#cf-wrapper -->
      
        <script type="text/javascript">
        window._cf_translation = {};
      
      
      </script>
      
      
      
      </body></html>
      

    Analysis

    From the extracted page source it is pretty clear using --headless argument you are reaching to a page with:

    • Heading as: Access denied | www.manta.com used Cloudflare to restrict access.
    • Some information: What happened?: This website is using a security service to protect itself from online attacks.

    Conclusion

    The Browsing Context i.e. Chrome Browser session is getting detected as a BOT and the navigation is blocked.


    Outro

    You can find a couple of relevant discussions in: