Search code examples
pythonweb-scrapingpython-requestsgoogle-colaboratoryhttp-error

503 Error When Trying To Crawl One Single Website Page | Python | Requests


Goal: I am trying to scrape the HTML from this page: https://www.doherty.jobs/jobs/search?q=&l=&lat=&long=&d=.

(note - I will eventually want to paginate and scrape all job listings from this page)

My issue: I get a 503 error when I try to scrape the page using Python and Requests. I am working out of Google Colab.

Initial Code:

import requests

url = 'https://www.doherty.jobs/jobs/search?q=&l=&lat=&long=&d='

response = requests.get(url)

print(response)

Attempted solutions:

  1. Using 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'
  2. Implementing this code I found in another thread:
import requests

def getUrl(url):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36',
    }
    res = requests.get(url, headers=headers)
    res.raise_for_status()

getUrl('https://www.doherty.jobs/jobs/search?q=&l=&lat=&long=&d=')

I am able to access the website via my browser.

Is there anything else I can try?

Thank you


Solution

  • That page is protected by cloudflare, there's some options to try to bypass it, seems that using cloudscraper works:

    import cloudscraper
    
    scraper = cloudscraper.create_scraper()
    url = 'https://www.doherty.jobs/jobs/search?q=&l=&lat=&long=&d='
    
    response = scraper.get(url).text
    
    print(response)
    

    In order to use it, you'll need to install it:

    pip install cloudscraper