I have the following html code:
<div class="jaHlC">
<div class="C" data-ft="true">
<div class="IuRIu"
<span>
<span class="biGQs _P fiohW uuBRH">
90 places sorted by traveler favorites</span>
</span>
<span class="nzZVd PJ">
I need to extract the text saying "90 places sorted by traveler favorites"
My python code is the following which does not work to extract the text:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = "https://www.tripadvisor.com/Attraction_Products-g28922-t21629-zfg21594-Alabama.html"
driver = webdriver.Firefox()
driver.get(url)
WebDriverWait(driver, 15).until(EC.element_to_be_clickable((By.ID, 'onetrust-accept-btn-handler'))).click()
# attempt 1 : does not work
#number = driver.find_element(By.XPATH, '//span[@class="biGQs _P fiohW uuBRH"]')
# attempt 2: does not work
#number = driver.find_element(By.XPATH, "/html/body/div[1]/main/div[1]/div/div[3]/div/div[2]/div[2]/div[2]/div/div/div[2]/div/div[2]/div/div/section[2]/div/div/div/span[1]/span")
# attempt 3: does not work either
number = driver.find_element(By.CSS_SELECTOR, "span.uuBRH")
Please suggest how I can extract the text. Thank you in advance.
The classname attribute values like biGQs
, fiohW
, uuBRH
, etc, are dynamically generated and is bound to change sooner/later. They may change next time you access the application afresh or even while next application startup. So can't be used in locators.
To extract the text 90 places sorted by traveler favorites ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:
Using CSS_SELECTOR and text attribute:
driver.get("https://www.tripadvisor.com/Attraction_Products-g28922-t21629-zfg21594-Alabama.html")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "section[data-automation=WebPresentation_WebSortDisclaimer] div > span > span"))).text)
Using XPATH and get_attribute("innerHTML")
:
driver.get("https://www.tripadvisor.com/Attraction_Products-g28922-t21629-zfg21594-Alabama.html")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//section[@data-automation='WebPresentation_WebSortDisclaimer']//div/span/span"))).get_attribute("innerHTML"))
Console output:
90 places sorted by traveler favorites
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python
Link to useful documentation:
get_attribute()
method Gets the given attribute or property of the element.
text
attribute returns The text of the element.