Search code examples
pythonweb-scrapingscrapyscrapy-splash

Alternative way to scrape the data from this website?


I am trying to scrape the odds for each game at https://www.oddschecker.com/us/football. I do not see an obvious way to access any API when using the Chrome Tools XHR tab. Am I missing something here? Where is this data coming from?

I know I could scrape the data by loading the Javascript using Splash or Selenium (I am using Scrapy and python) but I am having major headaches with Splash that I can't seem to get any help with. I was hoping someone could show me a way to access the API so I could skip using these ways to load dynamic websites.

Any suggestions would be appreciated!


Solution

  • When you see the page source, data in that website is loaded from a script variable with id initial-data

    
    from bs4 import BeautifulSoup 
    import requests, json
    
    headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}
    r = requests.get('https://www.oddschecker.com/us/football', verify=False, headers=headers)
    soup = BeautifulSoup(r.text,'lxml')
    data = json.loads(soup.find("script", {"id":"initial-data"}).get_text(strip=True))
    
    with open("data.json","w") as f:
        json.dump(data,f)