Search code examples
pythonseleniumxpathcss-selectorswebdriverwait

Using python and Selenium to scrape the innerText within an HTML element?


I wrote a script that uses the selenium and pyautogui modules to login and scrape a value from an element and print it, but it's printing two dashes --.

Here is the HTML that contains the value 417 which I want to retrieve:

<p id="totReqCountVal" class="trailer-0 avenir-regular font-size-4 text-green js-total-requests">417</p>

This is the relevant code I have tried:

from selenium import webdriver
from selenium.webdriver.common.by import By

browser.get('website_to_be_scraped')
browser.find_element(By.ID, 'totReqCountVal')

I then tried:

views = browser.find_element(By.ID, 'totReqCountVal')
    print(views)

which returns:

(session="12e48df447f7df855a1ee596ba609a30", element="1027ec31-8cb8-4758-b4b0-82b85628ed6c")

With some help I have also tried the following:

Using CSS_SELECTOR and text attribute:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "p#totReqCountVal[class$='js-total-requests']"))).text)
Using XPATH and get_attribute("innerHTML"):

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//p[@id='totReqCountVal' and contains(@class, 'js-total-requests')]"))).get_attribute("innerHTML"))

added the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

I have checked through devtools if the locator strategies identifies the element uniquely, checked for iframes, and shadow root.

How do I retrieve the 417 value?


Solution

  • views is the WebElement which on printing rightly prints:

    (session="12e48df447f7df855a1ee596ba609a30", element="1027ec31-8cb8-4758-b4b0-82b85628ed6c")
    

    Solution

    To print the text 417 you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

    • Using CSS_SELECTOR and text attribute:

      print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "p#totReqCountVal[class$='js-total-requests']"))).text)
      
    • Using XPATH and get_attribute("innerHTML"):

      print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//p[@id='totReqCountVal' and contains(@class, 'js-total-requests')]"))).get_attribute("innerHTML"))
      
    • Note : You have to add the following imports :

      from selenium.webdriver.support.ui import WebDriverWait
      from selenium.webdriver.common.by import By
      from selenium.webdriver.support import expected_conditions as EC
      

    You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python


    References

    Link to useful documentation: