Search code examples
pythonweb-scrapingbeautifulsoupwebdriver

Scraping after selecting all in a scrolling menu


I am trying to retrieve informations from a table in this link: https://ski-resort-stats.com/ski-resorts-in-europe/

The page has a scrolling menu, which I must act on first to have all the entries on the page and being able to select them on. But, when I am retrieving the infos I look for after, it does not do it for the whole table... I tried to add a sleeping time between the two actions in case it would be link to that but nothing changes. Could someone help me with that ? Here is my code below:

driver = webdriver.Chrome("path/chromedriver")
driver.get("https://ski-resort-stats.com/ski-resorts-in-europe/")

content = driver.page_source
soup = BeautifulSoup(content)

#Select "All" in the drop down menu to select all the ski resorts
menu=driver.find_element_by_id("table_1_length")
for option in menu.find_elements_by_tag_name('option'):
    if option.text == 'All':
        option.click()
        break

import time 
time.sleep(10)

mydivs = soup.find_all("td",{"class":"column-resort-name"})
print(mydivs)

So the last element printed of mydivs is not the last element of the table...


Solution

  • All data is already in the page in the <table>:

    import requests
    from bs4 import BeautifulSoup
    
    url = "https://ski-resort-stats.com/ski-resorts-in-europe/"
    soup = BeautifulSoup(requests.get(url).content, "html.parser")
    
    # print some data from rows
    for row in soup.select("#table_1 tbody tr"):
        r = [td.get_text(strip=True) for td in row.select("td")]
        print(r[1])
    

    Prints:

    Hemsedal
    Geilosiden Geilo
    Golm
    Hafjell
    Voss
    Hochschwarzeck
    Rossfeld - Berchtesgaden - Oberau
    
    ...
    
    Puigmal
    Kranzberg-Mittenwald
    Wetterstein lifts-Wettersteinbahnen-– Ehrwald
    Stuhleck-Spital am Semmering