Search code examples
pythonweb-scrapingbeautifulsouppython-requests

Why BeautifulSoup could not locate a specific table element from HTML?


I can't get the code to find the text found in the race form table, highlighted in the element below. What are the appropriate elements to actually getting that text?

import requests
from bs4 import BeautifulSoup

# URL of the webpage to scrape
url = "https://www.racingandsports.com.au/thoroughbred/horse/smokin-rubi/1978955"

# Define the user agent header
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}

# Send a GET request to the URL with the user agent header
response = requests.get(url, headers=headers)

# Parse the HTML content of the page
soup = BeautifulSoup(response.content, 'html.parser')

# Find the table element with class 'table table-condensed table-striped table-hover tbl-race-form'
table_element = soup.find('table', class_='table table-condensed table-striped table-hover tbl-race-form')

# Check if the table element is found
if table_element:
    # Extract text from the table
    table_text = table_element.get_text(separator='\n', strip=True)
    print(table_text)
else:
    print("No table with the specified class found on the page.")

Solution

  • Expected content is loaded / rendered dynamically and is not part of the static response you get via requests.

    However, to get a result containing the table change your url:

    url = "https://www.racingandsports.com.au/Horse/GetRaceFormPartialTable?horseIdStr=1978955&dic=thoroughbred"
    

    How to know if content is loaded / rendered dynamically in this case?

    First indicator, call up the website as a human in the browser and notice that a loading animation / delay appears for the area. Second indicator, the content is not included in the static response to the request. You can now use the browser's developer tools to look at the XHR Requests tab to see which data is being loaded from which resources. -> http://developer.chrome.com/docs/devtools/network

    If there is an api use it else go with selenium.