Search code examples
pythonselenium-webdriverweb-scrapingxpath

How to locate a button using selenium and xpath


I have tried many different ways I found online on how to locate this button, but after every try, the function gives me an empty list.

I need to locate the button and click it, to scrape different pages. The whole page is dynamicaly loaded and the contents of the second page aren't loaded until you open it, meaning they are not in a DOM until you move to a seccond page. The pages are dynamic as well, meaning there is no change to the url, if you click on different page.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup
import time
from selenium.webdriver.common.by import By

# Create a new instance of the Chrome driver
driver = webdriver.Chrome()

wait = WebDriverWait(driver, 10)

# Go to the webpage
driver.get('https://is.muni.cz/predmet/?volby=obory:4382@fakulta:1433@obdobi:podzim%202023,jaro%202024@jazyky:eng')


links = []

driver.implicitly_wait(15)

for i in range(1):
    
    website = driver.page_source
    soup = BeautifulSoup(website, 'html.parser')

    links += ['https://is.muni.cz' + link['href'] for link in soup.find_all('a', class_='course_link')]

    button = driver.find_elements(By.XPATH, '//a[@class="isi-zobacek-vpravo isi-inline"]')
    button.click()
    time.sleep(5)
    i += 1

print(links)

driver.quit()

This code just returns an error, because the click function doesn't work, because the button has no content. It is an empty list.


Solution

  • First issue - you try to find multiple elements, that will create a list, but these do not have a methode to click.

    Second issue - your selection, there is no a with such a class you try to find.

    So change your selection to:

    driver.find_element(By.XPATH, '//li[@class=" pagination-next"]/a')
    

    Just in case check out expliced waits and also concepts of try / except and may use a while loop to iterate the pages:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.common.exceptions import TimeoutException
    from bs4 import BeautifulSoup
    import time
    from selenium.webdriver.common.by import By
    
    # Create a new instance of the Chrome driver
    driver = webdriver.Chrome()
    
    # Go to the webpage
    driver.get('https://is.muni.cz/predmet/?volby=obory:4382@fakulta:1433@obdobi:podzim%202023,jaro%202024@jazyky:eng')
    
    links = []
    
    while True:
        
        website = driver.page_source
        soup = BeautifulSoup(website, 'html.parser')
    
        links.extend(['https://is.muni.cz' + link['href'] for link in soup.find_all('a', class_='course_link')])
        
        try:
            WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//li[@class=" pagination-next"]/a'))).click()
        except TimeoutException:
            break
    
    print(links)
    
    driver.quit()