Search code examples
pythonpandasweb-scrapingplaywrightplaywright-python

Can I press a button after another button using Playwright Python webscraping?


I'm trying to write a code that will go onto this website "https://racing.hkjc.com/racing/information/English/racing/RaceCard.aspx?RaceDate=2023/04/06&Racecourse=HV&RaceNo=1"and click on the horse named "lucky missile". It should get led to a popup window with a table of all the statistics of the horse.

Then, I want the program to click on the "Show All" button on the far right, so the table doesn't just show the statistics from the last 3 seasons, but instead the statistics from all seasons.

This is where my program encounters an issue. It can't seem to find the "Show All" button. Does anyone know how to fix this?

import pandas as pd
import xlsxwriter
from bs4 import BeautifulSoup
from playwright.sync_api import Playwright, sync_playwright, expect
import xlwings as xw

def scrape_ranking(url, sheet_name):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        page.goto(url)

        with page.expect_popup() as popup_info:
            page.click('text="LUCKY MISSILE"')

        page.get_by_text("Show All").click()

        popup = popup_info.value
        popup.wait_for_load_state("domcontentloaded")
        
        html = popup.content()
        browser.close()

    tables = pd.read_html(html)
    df = tables[7]
    with pd.ExcelWriter("hkjc.xlsx", engine="openpyxl", mode='a', if_sheet_exists='overlay') as writer:
        df.to_excel(writer, sheet_name=sheet_name, index=True)


url = ('https://racing.hkjc.com/racing/information/English/racing/RaceCard.aspx?RaceDate=2023/04/06&Racecourse=HV&RaceNo=1')
scrape_ranking(url, "LUCKY MISSILE")

Solution

  • That "button" looks like it has the text "Show all", but the text is rasterized onto an image (shudder):

    <img
      src="/racing/content/Images/StaticFile/English/hf_allr_btn.jpg"
      alt="Show All"
      style="width: 92px; height: 24px"
      id="hf_allr_btn_r"
      class="active"
      delsrc="/racing/content/Images/StaticFile/English/hf_allr_btn.jpg"
      border="0"
    />
    

    You could select this with

    popup.get_by_alt_text("Show All").click()
    

    which triggers a navigation, leading to a new page.

    Moral of the story: use the browser's dev tools to inspect the element to see what it really is.