Search code examples
pythonselenium-webdriverweb-scraping

Why can't I scrape all data?


with this flow I'm trying to scrape all data from a specific website. The main issue is related to the output of the flow because I'm not receiving the list of all home teams but only the name of home team from the first match. What can I do to receive all data fomr the website?

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome(executable_path=r"C:\Users\Lorenzo\Downloads\chromedriver.exe")
driver.get('https://www.diretta.it')
html = driver.page_source 
soup = BeautifulSoup(html,'lxml')
games = soup.find_all('div', class_ = 'event__match event__match--live event__match--last 
event__match--twoLine')
for game in games:
home = soup.find('div', class_ = 'event__participant event__participant--home').text
away = soup.find('div', class_ = 'event__participant event__participant--away').text
time = soup.find('div', class_ = 'event__time').text
print(home)

Solution

  • First of all when using selenium you don't need beautiful soup, because you can use find_elenet_by to find a tag and find_elements_by (elements with an s. Plural), to get a list of all tags with with similar entities.

    Your code would be:

    from selenium import webdriver
    
    driver = webdriver.Chrome(executable_path=r"C:\Users\Lorenzo\Downloads\chromedriver.exe")
    driver.get('https://www.diretta.it')
    
    games = driver.find_elements_by_css_selector('div[class = "event__match event__match--live event__match--last event__match--twoLine"]')
    
    for game in games:
        home = game.find_element_by_css_selector('div[class = "event__participant event__participant--home"]').text
        away = game.find_element_by_css_selector('div[class = "event__participant event__participant--away"]').text
        time = game.find_element_by_css_selector('div[class = "event__time"]').text
        
        print(home)