I want to scrape tennis data from this page: https://www.tennisabstract.com/cgi-bin/leaders.cgi for an assignment.
I need to use python libraries in Jupyter Notebook.
When I try to scrape this .cgi page I am unable to get any data from the table. Is there a way to scrape a .cgi page?
The code I am trying is:
url = "https://www.tennisabstract.com/cgi-bin/leaders.cgi"
response = requests.get(url, headers={"User-Agent": "XY"})
#response
page = response.content
scraping = BeautifulSoup(page, "lxml")
pd.set_option('display.max_rows', None)
table = BeautifulSoup(response.content, "lxml")
table = table.find_all("table")
df = pd.read_html(str(table))
df = df[1]
df
The outcome I get is (which changes when I use df[0]
, and fails at df[2]
which works for other tables on the HTML pages in the site:
0 | 1 | |
---|---|---|
0 |   | Stats: Serve | Return | Breaks | More |
1 | nan | nan |
2 | nan | nan |
Data is loaded and rendered dynamically by JavaScript, so you will not get the table from the static response on this ressource.
you could try to fetch and process the data from https://www.minorleaguesplits.com/tennisabstract/cgi-bin/jsmatches/leadersource.js
you could try to mimic a browser with e.g. selenium
and use the rendered source code version
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome()
url = f'https://www.tennisabstract.com/cgi-bin/leaders.cgi'
driver.get(url)
pd.read_html(driver.page_source, attrs={'id':'matches'})[0]
Rk | Player | M | M W-L | M W% | SPW | SPW-InP | Aces | Ace% | DFs | DF% | DF/2s | 1stIn | 1st% | 2nd% | 2%-InP | Hld% | Pts/SG | PtsL/SG | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Novak Djokovic [SRB] | 58 | 49-9 | 84.5% | 69.1% | 68.4% | 436 | 8.7% | 147 | 2.9% | 8.1% | 63.9% | 76.2% | 56.7% | 61.6% | 87.6% | 6.1 | 1.9 |
1 | 2 | Jannik Sinner [ITA] | 76 | 65-11 | 85.5% | 69.1% | 68.0% | 485 | 8.3% | 137 | 2.4% | 6.0% | 60.5% | 76.8% | 57.2% | 60.9% | 89.6% | 6.1 | 1.9 |
2 | 3 | Carlos Alcaraz [ESP] | 76 | 62-14 | 81.6% | 67.2% | 67.3% | 319 | 5.6% | 160 | 2.8% | 8.3% | 66.1% | 72.6% | 56.8% | 61.9% | 85.9% | 6.2 | 2 |
... | |||||||||||||||||||
48 | 49 | Zhizhen Zhang [CHN] | 50 | 26-24 | 52.0% | 64.5% | 63.3% | 340 | 8.3% | 119 | 2.9% | 8.0% | 63.9% | 72.0% | 51.2% | 55.6% | 80.7% | 6.3 | 2.2 |
49 | 50 | Daniel Evans [GBR] | 40 | 16-24 | 40.0% | 63.4% | 64.4% | 163 | 5.3% | 135 | 4.4% | 10.4% | 57.6% | 71.8% | 52.1% | 58.1% | 79.2% | 6.3 | 2.3 |
50 | nan | Average | nan | nan | 61.2% | 65.7% | 64.8% | nan | 8.6% | nan | 3.3% | 9.0% | 62.8% | 73.7% | 52.2% | 57.3% | 83.2% | 6.3 | 2.2 |