python selenium-webdriver web-scraping geckodriver

How to get the text using selenium in Python with geckodriver

I have the following html code:

<div class="jaHlC">
<div class="C" data-ft="true">
<div class="IuRIu"
<span>
<span class="biGQs _P fiohW uuBRH">
90 places sorted by traveler favorites</span>
</span>
<span class="nzZVd PJ">

I need to extract the text saying "90 places sorted by traveler favorites"

My python code is the following which does not work to extract the text:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

url = "https://www.tripadvisor.com/Attraction_Products-g28922-t21629-zfg21594-Alabama.html"

driver = webdriver.Firefox()
driver.get(url)

WebDriverWait(driver, 15).until(EC.element_to_be_clickable((By.ID, 'onetrust-accept-btn-handler'))).click()

# attempt 1 : does not work
#number = driver.find_element(By.XPATH, '//span[@class="biGQs _P fiohW uuBRH"]')

# attempt 2: does not work
#number = driver.find_element(By.XPATH, "/html/body/div[1]/main/div[1]/div/div[3]/div/div[2]/div[2]/div[2]/div/div/div[2]/div/div[2]/div/div/section[2]/div/div/div/span[1]/span")

# attempt 3: does not work either
number = driver.find_element(By.CSS_SELECTOR, "span.uuBRH")

Please suggest how I can extract the text. Thank you in advance.

Solution

The classname attribute values like biGQs, fiohW, uuBRH, etc, are dynamically generated and is bound to change sooner/later. They may change next time you access the application afresh or even while next application startup. So can't be used in locators.

Solution

To extract the text 90 places sorted by traveler favorites ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:

Using CSS_SELECTOR and text attribute:

driver.get("https://www.tripadvisor.com/Attraction_Products-g28922-t21629-zfg21594-Alabama.html")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "section[data-automation=WebPresentation_WebSortDisclaimer] div > span > span"))).text)

Using XPATH and get_attribute("innerHTML"):

driver.get("https://www.tripadvisor.com/Attraction_Products-g28922-t21629-zfg21594-Alabama.html")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//section[@data-automation='WebPresentation_WebSortDisclaimer']//div/span/span"))).get_attribute("innerHTML"))

Console output:
```
90 places sorted by traveler favorites
```

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python

References

Link to useful documentation:

get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium