python selenium-webdriver web-scraping selenium-chromedriver

Issue using selenenium WebDriver to get text data

I have successfully navigated to a websites page, that has multiple links to more pages. In my code I count the number of links, than convert it to xpath, so I can enter all the pages. It all works fine, until I try to get anything out of these pages. All of them have the exact form factor, so I should be able to use the same xpath inside the pages. ** Here is the section of the code that fails:**

xpath_expression = '//*[@id="wrapper"]/main/section/div/div[2]/table/tbody'
element = browser.find_element("xpath", xpath_expression)
html_code = element.get_attribute("outerHTML")
number_of_pages_to_enter= html_code.count("<tr")
number_of_pages_to_enter=number_of_pages_to_enter-2
xpatht_eleje='//*[@id="wrapper"]/main/section/div/div[2]/table/tbody/tr['


cimek=''
for i in range(number_of_pages_to_enter):
    number_of_pages_to_enter_str=str(number_of_pages_to_enter)
    number_of_pages_to_enter_str=number_of_pages_to_enter_str+"]/td[3]/a"
    xpath_expression=xpatht_eleje+number_of_pages_to_enter_str
    button_locator = ("xpath", xpath_expression)
    button = WebDriverWait(browser, 5).until(
        EC.presence_of_element_located(button_locator)
    )
    button.click()
    time.sleep(0.5)
    element = browser.find_element(By.XPATH, '//*[@id="eventHeader"]/div[2]/div/h1')
    cim=element.text
    cimek=cim+cim
    if number_of_pages_to_enter>0:
        number_of_pages_to_enter=number_of_pages_to_enter-1
    browser.back()

Here is the link of the page I am testing my code on:text The string called cimek, is where I try to collect the data.

I have already tried using other locators, then xpath, none works. The result that I get is only one element.

Solution

The source of the issue is that you are looping through all rows using

for i in range(number_of_pages_to_enter):

but you aren't using i, you are using number_of_pages_to_enter.

Having said that, you are spending a lot of time/code parsing, calculating, building strings, etc. to loop through the table when there's a much simpler way. You just build a locator to the elements you want, use find_elements(locator), and then loop through that group of elements.

Also, you are clicking the link to go to the next page to get who is playing in the game, e.g. "Cleveland - Dallas", but that info is already on the first page.

Kosárlabda, NBA
Cleveland - Dallas <<<
02.27. 18:00

I modified your code to get you this without having to click anything, leave the page and return, etc.

url = 'https://www.tippmix.hu/sportfogadas#?q=nba&page=1'
driver = webdriver.Chrome()
driver.maximize_window()
driver.get(url)

wait = WebDriverWait(driver, 10)
games = wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//tbody/tr/td[3][@class='title']/a")))
for game in games:
    print(game.text.split("\n")[1])

Output

Cleveland - Dallas
Washington - Golden State
Orlando - Brooklyn       
New York - New Orleans   
Atlanta - Utah
Boston - Philadelphia
Milwaukee - Charlotte
Chicago - Detroit    
Minnesota - San Antonio
Oklahoma City - Houston
Portland - Miami
Hard - Bärnbach/Köflach
NBA 2023/24