Search code examples
pythonseleniumselenium-webdriverheroku

Reached error page: The server at x is taking too long to respond


I want to deploy my application on Heroku. My application scrapes data of an apartment website. For one url, I have multiple selectors. The application is ran using APSceduler. Logs are showing the following error:

2020-08-10T11:02:56.259319+00:00 app[clock.1]: Running main
2020-08-10T11:04:34.374167+00:00 app[clock.1]: Job "main (trigger: interval[3:00:00], next run at: 2020-08-10 14:02:56 UTC)" raised an exception
2020-08-10T11:04:34.374183+00:00 app[clock.1]: Traceback (most recent call last):
2020-08-10T11:04:34.374184+00:00 app[clock.1]: File "/app/.heroku/python/lib/python3.8/site-packages/apscheduler/executors/base.py", line 125, in run_job
2020-08-10T11:04:34.374184+00:00 app[clock.1]: retval = job.func(*job.args, **job.kwargs)
2020-08-10T11:04:34.374185+00:00 app[clock.1]: File "/app/scraper/common.py", line 70, in main
2020-08-10T11:04:34.374186+00:00 app[clock.1]: driver.get(listing.url)
2020-08-10T11:04:34.374187+00:00 app[clock.1]: File "/app/.heroku/python/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
2020-08-10T11:04:34.374188+00:00 app[clock.1]: self.execute(Command.GET, {'url': url})
2020-08-10T11:04:34.374188+00:00 app[clock.1]: File "/app/.heroku/python/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
2020-08-10T11:04:34.374189+00:00 app[clock.1]: self.error_handler.check_response(response)
2020-08-10T11:04:34.374189+00:00 app[clock.1]: File "/app/.heroku/python/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
2020-08-10T11:04:34.374190+00:00 app[clock.1]: raise exception_class(message, screen, stacktrace)
2020-08-10T11:04:34.374191+00:00 app[clock.1]: selenium.common.exceptions.WebDriverException: Message: Reached error page: about:neterror?e=netTimeout&u=x&d=The%20server%20at%20x%20is%20taking%20too%20long%20to%20respond.

Decoded:

about:neterror?e=netTimeout&u=&d=The server at x is taking too long to respond.

If I go to the link,I can access it. I have disabled JavaScript and images so that links are loaded more quickly.

I am not sure what is the problem here.


Solution

  • As it turned out, the target website was blocking Heroku. Solution is to use proxy