Search code examples
pythonselenium-webdriverweb-scraping

Dynamic Web scaping with Selenium not working on TAMU dining website


I am creating a web-scraping API that will scrape my university's dining hall website and pull the food items that are currently offered depending on the time of day. Currently, the web scraper is not working only on the dining hall page.

Other dynamic websites, where the content does not load if you disable JavaScript, works with the current code (I used a frontend application I've been working on with React).

from selenium import webdriver 
from selenium.webdriver.common.by import By 
from selenium.webdriver.chrome.service import Service as ChromeService 
from webdriver_manager.chrome import ChromeDriverManager 
 
url = 'https://music-bridge.lucagiannotti.com/' 
 
driver = webdriver.Chrome(service=ChromeService( 
    ChromeDriverManager().install())) 
 
driver.get(url) 
 

elements = driver.find_elements(By.CLASS_NAME, 'container') 
for title in elements:
    print(title.text)

This code DOES yield all elements under the class container. However, if I scrape the dining hall page, it prints nothing, regardless of what class I put in.

from selenium import webdriver 
from selenium.webdriver.common.by import By 
from selenium.webdriver.chrome.service import Service as ChromeService 
from webdriver_manager.chrome import ChromeDriverManager 
 
url = 'https://dineoncampus.com/tamu/whats-on-the-menu' 
 
driver = webdriver.Chrome(service=ChromeService( 
    ChromeDriverManager().install())) 
 
driver.get(url) 
 
print(driver.page_source)

# select elements by class name 
elements = driver.find_elements(By.CLASS_NAME, 'table') 
for title in elements:
    print(title.text)

I figured it might be an issue with the page loading, but after testing this code, that was not the problem either. The full page loads, and all content is visible, but the table class is never found.

from selenium import webdriver 
from selenium.webdriver.common.by import By 
from selenium.webdriver.chrome.service import Service as ChromeService 
from webdriver_manager.chrome import ChromeDriverManager 
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

 
url = 'https://dineoncampus.com/tamu/whats-on-the-menu' 
 
driver = webdriver.Chrome(service=ChromeService( 
    ChromeDriverManager().install())) 
driver.get(url) 

wait = WebDriverWait(driver, 20).until(
    EC.presence_of_element_located((By.ID, "table"))
)

 
# print(driver.page_source)

# select elements by class name 
elements = driver.find_elements(By.CLASS_NAME, 'table') 
for title in elements:
    print(title.text)

I'm sure it's something to do with how the page loads the JavaScript, but I do not know exactly what the problem is, nor have I found anyone solutions with a Google search.


Solution

  • The issue is you're waiting for the ID of table to appear, which isn't going to happen. You need the CLASS_NAME of table, or something like menu-items.

    Switching to EC.presence_of_element_located((By.CLASS_NAME, "menu-items")):

    Grill
    Item
    Portion
    Calories
    Eggs Made to Order
    2 cage free eggs prepared to order
    Nutritional Info
    1 each
    60
    Hash Browns
    Nutritional Info
    1/2 cup
    180
    Pork Sausage Link
    Spiced breakfast sausage
    Nutritional Info
    2 each
    260
    Vegetable Omelet
    Nutritional Info
    1 each
    140
    Cheese Omelet
    Nutritional Info
    5 oz portion
    ...
    Oats 'n Honey Granola
    Nutritional Info
    1 tbsp
    20