Search code examples
pythonhtmlcssseleniumpycharm

Scraping a table with Selenium and printing it


I want to scrape all tables from a site. The automation is required to reach the tables, so you might consider that. My attempt with research is the following:

from selenium.webdriver import Firefox
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
import time

driver = Firefox(executable_path='/Users/.../PycharmProjects/Sportwinner/geckodriver')
driver.get("https://bskv.sportwinner.de/")
element = driver.find_element(By.ID, "id-button-einstellungen")
actions = ActionChains(driver)
actions.move_to_element(element).perform()
driver.find_element(By.ID, "id-button-einstellungen").click()
element = driver.find_element(By.CSS_SELECTOR, "body")
actions = ActionChains(driver)
actions.move_to_element(element).perform()
driver.find_element(By.ID, "id-klub-name").click()
driver.find_element(By.ID, "id-klub-name").send_keys("Dreieck Schweinfurt")
driver.find_element(By.ID, "id-button-einstellungen-option-ok").click()
time.sleep(1)
driver.find_element(By.ID, "id-dropdown-liga").click()
driver.find_element(By.LINK_TEXT, "Letzte Spielwoche").click()

tableContent = driver.find_elements_by_css_selector("id-table-spiel tr")
for row in tableContent:
    print(row.text)

Since I just heard about Selenium a couple of hours ago, I am a total noobie. I have no clue if this works, because I don't see any output. Is anybody able to help me with my attempt (I guess it's not correct) and how it is possible for me to see the result? I am using PyCharm for compiling.


Solution

  • The execution was so fast that it was not able to extract details from the table.

    You need apply Implicit wait or Explicit waits so that the table data shows up and can be able to extract details.

    # Imports Required
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.support.wait import WebDriverWait
    
    ...
    driver = webdriver.Chrome(executable_path="chromedriver.exe") # Have tried in Chrome
    driver.implicitly_wait(20)
    
    # Or apply Explicit wait like below.
    wait = WebDriverWait(driver,30)
    wait.until(EC.presence_of_element_located((By.XPATH,"//table[@id='id-table-spiel']//tbody/tr")))
    
    tableContent = driver.find_elements_by_xpath("//table[@id='id-table-spiel']//tbody/tr//div")
    for row in tableContent:
        print(row.get_attribute("innerText")) # row.text works too.
    

    You can try like this:

    tableContent = driver.find_elements_by_xpath("//table[@id='id-table-spiel']//tbody/tr//a")
    
    for i in range(len(tableContent)):
        tableContent[i].click() # Clicks on the "+" icon
        innerrows = driver.find_elements_by_xpath("//tr[@class='detail-view'][{}]//tr".format(i+1)) #Find the rows inside the 1st and 2nd row.
        for inrow in innerrows:
            elemnets = inrow.find_elements_by_xpath(".//div") # Data are in "div" tags
            data = [] #Collect each row in a list
            for j in elemnets:
                data.append(j.text)
            print(data)
    
    ['', '', '1', '', '2', '', '3', '', '4', '', 'Kegel', '', 'SP', '', 'MP', '', '', '', 'MP', '', 'SP', '', 'Kegel', '', '4', '', '3', '', '2', '', '1', '', '', '']
    ['Krug, Tobias', '141', '141', '136', '86', '141', '152', '124', '131', 'Brandl, Gerald']
    ['Keller, Ralf', '148', '135', '139', '130', '140', '111', '154', '145', 'Haschke, Jens']