Search code examples
pythonseleniumselenium-webdriverwebdriverwaitpraw

Getting text value of a HTML tag through Selenium Web Automation in Python?


I am making a reddit bot that will look for certain attributes in comments, use selenium to visit the information website, and use driver.find_element_by... to get the value inside that tag, but it is not working.

When I use driver.find_element_by_class_name(), this is the data returned:

<selenium.webdriver.remote.webelement.WebElement (session="f454dcf92728b9db4de080a27a844bf7", element="514bd57d-99d7-4fce-a05d-3fa92f66ad49")>

when I use driver.find_elements_by_css_selector(".style-scope.ytd-video-renderer"), this is returned:

[
  <selenium.webdriver.remote.webelement.WebElement (session="43cb953cde81df270260bf769fe081a2", element="6b4ee3e2-5e6b-48e2-8ec8-9083bf15baea")>, 
  <selenium.webdriver.remote.webelement.WebElement (session="43cb953cde81df270260bf769fe081a2", ...
]

when I use driver.find_elements_by_css_selector(".style-scope.ytd-video-renderer").

Suppose that this is what I'm trying to locate (The above code returned the above Selenium data for this tag):

<yt-formatted-string class="style-scope ytd-video-renderer" aria-label="Sword Art Online: Alicization Lycoris Opening Full『ReoNa - Scar/let』 by Melodic Star 2 months ago 4 minutes, 18 seconds 837,676 views">Sword Art Online: Alicization Lycoris Opening Full『ReoNa - Scar/let』</yt-formatted-string>

What I want

I want Sword Art Online: Alicization Lycoris Opening Full『ReoNa - Scar/let』 returned.

What could I do?


Solution

  • Seems you were pretty close enough. When you use driver.find_element_by_class_name() the first matching WebElement is returned. On printing the same, the output is:

    <selenium.webdriver.remote.webelement.WebElement (session="f454dcf92728b9db4de080a27a844bf7", element="514bd57d-99d7-4fce-a05d-3fa92f66ad49")>
    

    which represents the WebElement itself, which possibly contains the desired text.

    On similar lines driver.find_elements_by_css_selector(".style-scope.ytd-video-renderer") returns a list of matching WebElements and on printing those, the output is:

    [
      <selenium.webdriver.remote.webelement.WebElement (session="43cb953cde81df270260bf769fe081a2", element="6b4ee3e2-5e6b-48e2-8ec8-9083bf15baea")>, 
      <selenium.webdriver.remote.webelement.WebElement (session="43cb953cde81df270260bf769fe081a2",
      ...
    ]
    

    Solution

    To extract the text Sword Art Online: Alicization Lycoris Opening Full『ReoNa - Scar/let』 from the following HTML:

    <yt-formatted-string class="style-scope ytd-video-renderer" aria-label="Sword Art Online: Alicization Lycoris Opening Full『ReoNa - Scar/let』 by Melodic Star 2 months ago 4 minutes, 18 seconds 837,676 views">Sword Art Online: Alicization Lycoris Opening Full『ReoNa - Scar/let』</yt-formatted-string>
    

    You can use either of the following Locator Strategies:

    • Using css_selector and get_attribute():

      print(driver.find_element_by_css_selector("yt-formatted-string.style-scope.ytd-video-renderer").get_attribute("innerHTML"))
      
    • Using xpath and text attribute:

      print(driver.find_element_by_xpath("//yt-formatted-string[@class='style-scope ytd-video-renderer']").text)
      

    Ideally, to print the text 3,862.76 you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

    • Using CSS_SELECTOR and get_attribute():

      print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "yt-formatted-string.style-scope.ytd-video-renderer"))).get_attribute("innerHTML"))
      
    • Using XPATH and text attribute:

      print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//yt-formatted-string[@class='style-scope ytd-video-renderer']"))).text)
      
    • Note : You have to add the following imports :

      from selenium.webdriver.support.ui import WebDriverWait
      from selenium.webdriver.common.by import By
      from selenium.webdriver.support import expected_conditions as EC
      

    You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python


    Outro

    Link to useful documentation: