Search code examples
pythonweb-scrapingdynamic

Scrape calendar results


I'm looking to scrape historical data from:

https://www.racenet.com.au/results/horse-racing

the history is obtained by going to the "Select Date" tab and selecting a date and clicking on the "View Results" button.

You'll notice interacting with the calendar in this way does not change the URL, so I'm lost as to how to cycle through the calendar and bring up the schedule for a particular date and then how to access the results, i.e., when I select a date from the calendar manually and then "View Source" on the returned page, I don't see the links equivalent to the specific races.

Example: randomly select May 11 2021 from the calendar Mackay (QLD) is the first track listed. Right-clicking on this page and searching "Mackay" yields no match. Manually clicking the first race, "R1", sees the URL change to: https://www.racenet.com.au/results/horse-racing/mackay-20210511/smartstate-rentals-bm65-race-1 which is then fine for me to consume, it's the steps involved in cycling through the calendar dates and getting a handle on those race URLs that's my problem.

I'm hoping there's a solution in python, any tips/suggestions on how to solve this would be much appreciated.


Solution

  • Here's a more complete answer that will retrieve all of the horses running in each event at every meeting on the chosen day.

    import requests
    import time
    from bs4 import BeautifulSoup
    
    DATE = "2024-06-07"
    
    headers = {
        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:126.0) Gecko/20100101 Firefox/126.0",
        "Accept": "*/*",
        "authorization": "Bearer none",
    }
    
    params = {
        "operationName": "meetingsIndexByStartEndDate",
        "variables": '{"startDate": "' + DATE + '", "endDate": "' + DATE + '", "limit": 100}',
        "extensions": '{"persistedQuery": {"version": 1, "sha256Hash": "998212fede87c9261e0f18e9d8ced2ed04a915453dcd64ae1b5cf5a72cf25950"}}',
    }
    
    response = requests.get("https://puntapi.com/graphql-horse-racing", params=params, headers=headers)
    
    races = response.json()
    
    for group in races["data"]["meetingsGrouped"]:
        for meeting in group["meetings"]:
            for event in meeting["events"]:
                time.sleep(5)
                print("🟦 "+meeting["name"]+" — "+event["name"]+"\n")
    
                URL = "https://www.racenet.com.au/results/horse-racing/"+meeting["slug"]+"/"+event["slug"]
    
                print("URL: "+URL+"\n")
    
                response = requests.get(URL, headers=headers)
    
                soup = BeautifulSoup(response.text, "html.parser")
    
                names = soup.select("h4.selection-result__info-competitor-name")
    
                for name in names:
                    print(name.get_text().strip())
    
                print()
    
    1. Uses the underlying API to retrieve a list of meetings and events for a specific date.
    2. Iterates over meetings and events, retrieving static HTML for each event and extracting the horses. Also gets meeting name and race name from the API response.

    There's a lot more data in both the API response and the static HTML. You can rummage around in that and find everything you need.

    The output looks like this:

    🟦 Ipswich — Tab Ipswich Cup Tickets On Sale Mdn Plate                                                                                                                                                                                  
                                                                                                                                                                                                                                            
    URL: https://www.racenet.com.au/results/horse-racing/ipswich-20240607/tab-ipswich-cup-tickets-on-sale-mdn-plate-race-1                                                                                                                  
                                                                                                                                                                                                                                            
    6. Vermeer                                                                                                                                                                                                                              
    5. Salamancas                                                                                                                                                                                                                           
    9. Luisana                                                                                                                                                                                                                              
    8. Fionte                                                                                                                                                                                                                               
    4. Himeji                                                                                                                                                                                                                               
    7. Cassie's Girl                                                                                                                                                                                                                        
    10. Kaytee Sunnyline                                                                                                                                                                                                                    
    3. Turpin's Torment                                                                                                                                                                                                                     
    2. Shambolic                                                                                                                                                                                                                            
    1. Push Turbo                                                                                                                                                                                                                           
                                                                                                                                                                                                                                            
    🟦 Ipswich — Put It On Black Mdn Hcp                                                                                                                                                                                                    
                                                                                                                                                                                                                                            
    URL: https://www.racenet.com.au/results/horse-racing/ipswich-20240607/put-it-on-black-mdn-hcp-race-2                                                                                                                                    
                                                                                                                                                                                                                                            
    3. Find Your Own                                                                                                                                                                                                                        
    4. Hydros                                                                                                                                                                                                                               
    8. Look It's Lucy                                                                                                                                                                                                                       
    9. Starspangle Planet                                                                                                                                                                                                                   
    7. Literacy                                                                                                                                                                                                                             
    1. Arnie's Army