Search code examples
browserautomationbotsscreen-scrapingdetection

Can a website detect when using Chromium via Puppeteer?


When scraping a website using Chromium with Node plus Puppeteer (not Selenium and ChromeDriver), it is able to detect and blocks me throwing customized error instead of serving the pages, while the same being loaded properly if Chromium loaded manually. So the question: Is there a way to detect anti-bot software installed on a website and bypass it while browser automation?

PS: I have gone through all points discussed thoroughly @ Can a website detect when you are using selenium with chromedriver? and performed relevant tests considering all key points gathered from there but ended up with similar results as with Selenium. Hence i would like to know if there is any latest findings or if any latest automation technology countering this technical challenge. Also replacing $cdc_ doesn't work any more with latest versions of Selenium plus ChromeDriver as per my last night tests.

Example site: https://www.naukri.com/posted-today-jobs , i'm trying to scrape jobs listed there using Chromium+Node+Puppeteer, but its detecting and blocking while opening the pages in new tabs itself, both in headless & headfull modes. Same results with latest Selenium+Node+ChromeDriver.


Solution

  • Yes it can - as you state yourself. Contact your admin or developer to deactivate it for you or hand you the tester-bypass-key. Another option is to have them whitelist your IP, since you surely are a legitimate user, working for their company, not trying to leach others data, costing them webhosting capacity and driving up their bill.