Search code examples
pythonseleniumweb-scrapingurllib

Error when trying to web scraping with urllib.reques


I am trying to get the html of the following web: https://betway.es/es/sports/cpn/tennis/230 in order to get the matches' names and the odds with the code in python:

from bs4 import BeautifulSoup
import urllib.request

url = 'https://betway.es/es/sports/cpn/tennis/230'
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, 'html.parser')
soup = str(soup)

But when I run the code it throws the next exception: HTTPError: HTTP Error 403: Forbidden I have seen that maybe with headers could be possible, but I am completely new with this module so no idea how to use them. Any advice? In addition, although I am able to download the url, I cannot find the odds, anyone knows what can be a reason?


Solution

  • I'm unfortunately part of a country blocked by this site.
    But, using the requests package:

    import requests as rq
    from bs4 import BeautifulSoup as bs
    
    url = 'https://betway.es/es/sports/cpn/tennis/230'
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/86.0"}
    page = rq.get(url, headers=headers)
    

    You can find your headers in F12 -> Networks -> random line -> Headers Tab
    It is, as a result, a partial answer.