I am currently trying to scrape the table from this website "https://racing.hkjc.com/racing/information/English/racing/RaceCard.aspx?RaceDate=2023/04/06&Racecourse=HV&RaceNo=1", then click on the horse names which will lead us to a new link, and scrape the tables in there as well.
This is the code I currently have. It is just a test code for the first horse. (some of the imports are for future things)
import pandas as pd
import xlsxwriter
from bs4 import BeautifulSoup
from playwright.sync_api import Playwright, sync_playwright, expect
import xlwings as xw
def scrape_ranking(url):
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto(url)
page.click('text="AI ONE"') #the link that will lead us to the horse info
html = page.content()
browser.close()
tables = pd.read_html(html)
df = tables[0]
df.to_excel("hkjc.xlsx", index=False)
url_1 = ('https://racing.hkjc.com/racing/information/English/racing/RaceCard.aspx?RaceDate=2023/04/06&Racecourse=HV&RaceNo=1')
scrape_ranking(url_1)
This code doesn't crash. However, instead of printing the horse record table, it prints the original table from this website "https://racing.hkjc.com/racing/information/English/racing/RaceCard.aspx?RaceDate=2023/04/06&Racecourse=HV&RaceNo=1" (the race card).
Is there a way to make it so that the code clicks on the horse name(the link), which leads it to a new website (the horse record), and prints that table out?
The site opens a popup with the horse's details. You can use the code from handling popups and waiting for the page to load in the docs:
# ...
page.goto(url)
with page.expect_popup() as popup_info:
page.click('text="AI ONE"')
popup = popup_info.value
popup.wait_for_load_state("domcontentloaded")
html = popup.content()
# ...