I am running a scraping bot in headless mode. As you know it contains headless string in useragent when it's running in headless mode. To avoid that issue, I changed useragent. And the website detect this fake useragent and block scraping bot. How can I prevent this detection?
I am using selenium chromedriver.
Please add those options
# windows_useragent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36"
# linux_useragent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36"
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_argument("--no-sandbox")
options.add_argument("user-agent=#{linux_useragent}")
options.add_argument("--disable-web-security")
options.add_argument("--disable-xss-auditor")
options.add_option("excludeSwitches", ["enable-automation", "load-extension"])
navigator.platform and navigator.userAgent should be matched.
If userAgent is for windows, then navigator.platform should be "Win32"
If userAgent is for linux, then navigator.platform should be "Linux x86_64"
You can set like that
platform = {
windows: "Win32",
linux: "Linux x86_64"
}
driver.execute_cdp("Page.addScriptToEvaluateOnNewDocument", {
"source": "
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
}),
Object.defineProperty(navigator, 'languages', {
get: () => ['en-US', 'en']
}),
Object.defineProperty(navigator, 'platform', {
get: () => \"#{platform[:linux]}\"
})"
})
and of course you need to set navigator.webdriver to undefined