Search code examples
rubyseleniumselenium-chromedriveruser-agentbotdetect

How to prevent fake useragent detection in selenium headless?


I am running a scraping bot in headless mode. As you know it contains headless string in useragent when it's running in headless mode. To avoid that issue, I changed useragent. And the website detect this fake useragent and block scraping bot. How can I prevent this detection?

I am using selenium chromedriver.


Solution

  • Please add those options

        # windows_useragent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36"
        # linux_useragent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36"
        options.add_argument("--disable-blink-features=AutomationControlled")
        options.add_argument("--no-sandbox")
        options.add_argument("user-agent=#{linux_useragent}")
        options.add_argument("--disable-web-security")
        options.add_argument("--disable-xss-auditor")
        options.add_option("excludeSwitches", ["enable-automation", "load-extension"])
    

    navigator.platform and navigator.userAgent should be matched.

    If userAgent is for windows, then navigator.platform should be "Win32"

    If userAgent is for linux, then navigator.platform should be "Linux x86_64"

    You can set like that

    platform = {
      windows: "Win32",
      linux: "Linux x86_64"
    }
    driver.execute_cdp("Page.addScriptToEvaluateOnNewDocument", {
      "source": "
        Object.defineProperty(navigator, 'webdriver', {
          get: () => undefined
        }),
        Object.defineProperty(navigator, 'languages', {
          get: () => ['en-US', 'en']
        }),
        Object.defineProperty(navigator, 'platform', {
          get: () => \"#{platform[:linux]}\"
        })"
    })
    

    and of course you need to set navigator.webdriver to undefined