Search code examples
pythonseleniumxpathbeautifulsoupwebdriverwait

Why selenium and firefox webdriver cannot crawl wesite tags loaded by ajax


I want to get some HTML tags' texts from bonbast which some elements are loaded by ajax (for example tag with "ounce_top" id). I have tried selenium and geckodriver but again I can not crawl these tags and also when robotic firefox (geckodriver) opens, these elements are not shown on the web page! I have no idea why it happens. How can I crawl this website?

Code trials:

from selenium import webdriver
from bs4 import BeautifulSoup

url_news = 'https://bonbast.com/'
driver = webdriver.Firefox()
driver.get(url_news)
html = driver.page_source
soup = BeautifulSoup(html)
a = driver.find_element_by_id(id_="ounce_top")

Solution

  • The desired element is a dynamic element, so ideally to extract the desired text i.e. 1,817.43 you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

    • Using CSS_SELECTOR:

      driver.get("https://bonbast.com/")
      WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.btn.btn-primary.btn-sm.acceptcookies"))).click()
      print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span#ounce_top"))).text)
      
    • Using XPATH:

      driver.get("https://bonbast.com/")
      WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.btn.btn-primary.btn-sm.acceptcookies"))).click()
      print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[@id='ounce_top']"))).text)
      
    • Console Output:

      1,817.43
      
    • Note : You have to add the following imports :

      from selenium.webdriver.support.ui import WebDriverWait
      from selenium.webdriver.common.by import By
      from selenium.webdriver.support import expected_conditions as EC
      

    You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python