Search code examples
pythonweb-scrapingselenium-chromedriverundetected-chromedriver

undetected_chromedriver runs slowly, suggestions?


I'm making a price scraping program and have ran into the issue of antiscraping systems. I managed to get around these with the undetected_chromedriver but now I'm running into 2 issues

the first is that the UC is significantly slower than the standard chrome driver, through I need it for some sites, so I have some sites scraped with a normal driver and others with the UC

the second problem is that I have the standard Chrome driver install at the beginning of the program, but once I do that, the UC feels the need to install every time I open it?? this causes some sites to be scraped really slowly. can you help with why that is? and any other tips for running scraper faster would be appreciated.

I have this run at the beginning of the program as global variables:

chrome_path = Service(ChromeDriverManager().install())

options = webdriver.ChromeOptions()
options.headless = True
options.add_experimental_option('excludeSwitches', ['enable-logging'])

and this runs as a function every time I need a UC:

def start_uc():
    options = webdriver.ChromeOptions()
    # just some options passing in to skip annoying popups
    options.add_argument('--no-first-run --no-service-autorun --password-store=basic')
    driver = uc.Chrome(options=options)
    driver.minimize_window()
    return driver

My scraping functions just loop looking up the url and scrape the info, and restart the driver to clear the cookies if I run into a captcha .The scraping functions look like this (this is psuedo code to give you an idea):

driver = start_uc()
for url in url_list:
    while true:
        try:
            driver.get(url)
            #scrape info
            break
        except:
            driver.close()
            driver = start_uc()

I dont see why chrome_path would affect the UC? and are there any suggestions to make the scraping functions run more efficiently? Im not an expert on drivers and their intricacies so I could be doing something terribly wrong that I dont recognize.

thankyou in advance!


Solution

  • You can use https://github.com/seleniumbase/SeleniumBase to speed things up. (It has a special undetected-chromedriver mode that works with headless mode.)

    pip install -U seleniumbase

    And then run the following with python:

    from seleniumbase import Driver
    from seleniumbase import page_actions
    
    driver = Driver(headless=True, uc=True)
    driver.get("https://nowsecure.nl")
    page_actions.wait_for_text(driver, "OH YEAH, you passed!", "h1")
    print(driver.find_element("css selector", "body").text)
    screenshot_name = "now_secure_image.png"
    driver.save_screenshot(screenshot_name)
    print("\nScreenshot saved to: %s" % screenshot_name)
    driver.quit()