Search code examples
python-3.xselenium-webdriverdata-extraction

Extract a specific element of a table with selenium web driver


Hello I am trying to extract some elements from this website : https://www.oddsportal.com/basketball/italy/lega-a-super-cup/sassari-brindisi-rTJFaIyk/

I want the highest odds from home and away team. These data are located at the end of the table and are : 1.31 and 4.57

Here is my script :

#!/usr/bin/python3
# -*- coding: utf­-8 ­-*-

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from concurrent.futures import ThreadPoolExecutor

options = Options()
options.headless = True
options.add_argument("window-size=1400,800")
options.add_argument("--no-sandbox")
options.add_argument("--disable-gpu")
options.add_argument("start-maximized")
options.add_argument("enable-automation")
options.add_argument("--disable-infobars")
options.add_argument("--disable-dev-shm-usage")

driver = webdriver.Chrome(options=options)

driver.get("https://www.oddsportal.com/basketball/italy/lega-a-super-cup/sassari-brindisi-rTJFaIyk/")

home_average_odds = [my_elem.text for my_elem in WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, '//*[@class="highest"]/td[contains(@class, "right")]')))]

for i in home_average_odds:
    print(i)

driver.close()
driver.quit()        

The problem is that I do not have the good result, here is the output :

1.31
4.30
 100.4%

Solution

  • What is the "good result"?

    You can get the avergae by pulling the table with pandas and just pull that row:

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    from concurrent.futures import ThreadPoolExecutor
    
    import pandas as pd
    
    options = Options()
    options.headless = True
    options.add_argument("window-size=1400,800")
    options.add_argument("--no-sandbox")
    options.add_argument("--disable-gpu")
    options.add_argument("start-maximized")
    options.add_argument("enable-automation")
    options.add_argument("--disable-infobars")
    options.add_argument("--disable-dev-shm-usage")
    
    driver = webdriver.Chrome(options=options)
    
    driver.get("https://www.oddsportal.com/basketball/italy/lega-a-super-cup/sassari-brindisi-rTJFaIyk/")
    html = driver.page_source
    
    df = pd.read_html(html)[0]
    
    avg = df[df['Bookmakers'] == 'Average']
    print (avg)
    

    Output:

    print (avg)
       Bookmakers     1     2 Payout Unnamed: 4
    49    Average  -408  +291  94.4%        NaN
    

    Output matches the table

    enter image description here