Search code examples
pythondjangoselenium-webdriverheroku

Selenium "ERR_CONNECTION_RESET" using headless on Heroku


I have a Django + Selenium app I'm trying to deploy to Heroku. I have a management command I call that activates a Selenium Webdriver to use.

Whenever I run it locally it's totally fine (without headless) however upon deploying to Heroku no matter what I try I just get:

Message: unknown error: net::ERR_CONNECTION_RESET
  (Session info: headless chrome=116.0.5845.140)

I instantiate my webdriver as follows:

...
            logger.info("Starting selenium webdriver...")
            options = Options()
            options.add_argument("--headless")
            options.add_argument("--disable-dev-shm-usage")
            options.add_argument("--no-sandbox")
            options.add_argument("--disable-gpu")
            options.add_argument("--enable-logging")
            options.add_argument("--incognito")
            options.add_argument("--ignore-certificate-errors")
            options.add_argument("--disable-extensions")
            options.add_argument("--dns-prefetch-disable")

            self.webdriver = webdriver.Chrome(
                service=Service(ChromeDriverManager().install()), options=options
            )

I think the issue is the headless argument - adding it locally at least is what breaks things, however.

This is running on Heroku so I need the headless to work.

I'm really stumped. Any help/advice is appreciated - thank you!


Solution

  • After wasting an entire day on this I finally figured it out

    This ERR_CONNECTION_RESET is a weird response, and with scraping usually means you're being blocked. In this case, they were detecting it via the headless browser which is also fairly common.

    Not necessarily recommending it but you can work around this by using undetected chromedriver

    I instantiate my webdriver like so:

    
            options = Options()
            options.add_argument("--disable-gpu")
            options.add_argument("--no-sandbox")
            options.add_argument("--disable-dev-shm-usage")  # needed on heroku for chrome
            options.add_argument("start-maximized")
            self.webdriver = uc.Chrome(headless=True, use_subprocess=False, options=options)
    

    the --disable-dev-shm-usage isn't directly related but was needed to get it working on Heroku. There's a lot of info on this but has to do with the way the Heroku dynos handle memory.

    Hope this helps someone!