Search code examples
pythonseleniumxpathhtml-tablewebdriver

Need help identifying right XPath


I'm trying to scrape all of the table from this website : https://qmjhldraft.rinknet.com/results.htm?year=2018

When the XPath is a simple td (like the names for example), I can scrape the table with the simple xpath being something like this :

players = driver.find_elements_by_xpath('//tr[@rnid]/td[4]')

And I can scrape the players name using this code :

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

PATH = 'C:\Program Files (x86)\chromedriver.exe'
driver = webdriver.Chrome(PATH)
driver.get('https://qmjhldraft.rinknet.com/results.htm?year=2018')

try:
    elements = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.XPATH, "//tr[@rnid]/td[1]"))
    )
finally:
    players = driver.find_elements_by_xpath('//tr[@rnid]/td[4]')
    
for player in players[:5]:
    pl = player.text
    print(pl)

But when I get to the "Height" section, I can't find the write XPath. I guess this has to do with the td having a class, "ht-itemVisibility1", changing the way to scrape it, I've tried a few different ways to scrape it, like :

('//tr/td[@class="ht-itemVisibility1"][1]')
('//tr/td[@class="ht-itemVisibility1"][5]')
('//tr[@rnid]/td[5]')

to no avail. Can someone enlighten me on the way to capature this XPath with td class? Thanks a lot.


Solution

  • Try this

    from selenium import webdriver
    from webdriver_manager.chrome import ChromeDriverManager
    
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    driver = webdriver.Chrome(ChromeDriverManager().install())
    driver.get('https://qmjhldraft.rinknet.com/results.htm?year=2018')
    
    try:
        elements = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, "//tr[@rnid]/td[1]"))
        )
    finally:
        players = driver.find_elements_by_xpath('//tr[@rnid]/td[4]')
        
    for player in players[:5]:
        pl = player.text
        print(pl)
    
    players_height = driver.find_elements_by_xpath('//tr/td[@class="ht-itemVisibility1"][1]')
    
    for player in players_height[:5]:
        pl = player.text
        print(pl)
    
    players_last_team = driver.find_elements_by_xpath('//tr/td[@class="ht-itemVisibility1"][5]')
    
    for player in players_last_team[:5]:
        pl = player.text
        print(pl)
    
    

    don't know why it wasn't working for you but it's working fine with me.

    Results: enter image description here