Search code examples
pythonselenium-webdriverweb-scrapingselenium-chromedriver

Issue using selenenium WebDriver to get text data


I have successfully navigated to a websites page, that has multiple links to more pages. In my code I count the number of links, than convert it to xpath, so I can enter all the pages. It all works fine, until I try to get anything out of these pages. All of them have the exact form factor, so I should be able to use the same xpath inside the pages. ** Here is the section of the code that fails:**

xpath_expression = '//*[@id="wrapper"]/main/section/div/div[2]/table/tbody'
element = browser.find_element("xpath", xpath_expression)
html_code = element.get_attribute("outerHTML")
number_of_pages_to_enter= html_code.count("<tr")
number_of_pages_to_enter=number_of_pages_to_enter-2
xpatht_eleje='//*[@id="wrapper"]/main/section/div/div[2]/table/tbody/tr['


cimek=''
for i in range(number_of_pages_to_enter):
    number_of_pages_to_enter_str=str(number_of_pages_to_enter)
    number_of_pages_to_enter_str=number_of_pages_to_enter_str+"]/td[3]/a"
    xpath_expression=xpatht_eleje+number_of_pages_to_enter_str
    button_locator = ("xpath", xpath_expression)
    button = WebDriverWait(browser, 5).until(
        EC.presence_of_element_located(button_locator)
    )
    button.click()
    time.sleep(0.5)
    element = browser.find_element(By.XPATH, '//*[@id="eventHeader"]/div[2]/div/h1')
    cim=element.text
    cimek=cim+cim
    if number_of_pages_to_enter>0:
        number_of_pages_to_enter=number_of_pages_to_enter-1
    browser.back()

Here is the link of the page I am testing my code on:text The string called cimek, is where I try to collect the data.

I have already tried using other locators, then xpath, none works. The result that I get is only one element.


Solution

  • The source of the issue is that you are looping through all rows using

    for i in range(number_of_pages_to_enter):
    

    but you aren't using i, you are using number_of_pages_to_enter.


    Having said that, you are spending a lot of time/code parsing, calculating, building strings, etc. to loop through the table when there's a much simpler way. You just build a locator to the elements you want, use find_elements(locator), and then loop through that group of elements.

    Also, you are clicking the link to go to the next page to get who is playing in the game, e.g. "Cleveland - Dallas", but that info is already on the first page.

    Kosárlabda, NBA
    Cleveland - Dallas <<<
    02.27. 18:00
    

    I modified your code to get you this without having to click anything, leave the page and return, etc.

    url = 'https://www.tippmix.hu/sportfogadas#?q=nba&page=1'
    driver = webdriver.Chrome()
    driver.maximize_window()
    driver.get(url)
    
    wait = WebDriverWait(driver, 10)
    games = wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//tbody/tr/td[3][@class='title']/a")))
    for game in games:
        print(game.text.split("\n")[1])
    

    Output

    Cleveland - Dallas
    Washington - Golden State
    Orlando - Brooklyn       
    New York - New Orleans   
    Atlanta - Utah
    Boston - Philadelphia
    Milwaukee - Charlotte
    Chicago - Detroit    
    Minnesota - San Antonio
    Oklahoma City - Houston
    Portland - Miami
    Hard - Bärnbach/Köflach
    NBA 2023/24