Search code examples
pythonseleniumxpathhrefwebdriverwait

Extracting multiple text using partial href information


I'm trying to extract multiple genre from below site. (I already know the URLs) https://www.discogs.com/master/1515454-Zedd-Katy-Perry-365

<div class="profile">
  <h1 id="profile_title" class="hide_mobile has_action)menu">...<h1>
  <div class="head">Genre:<div> ==$0
  <div class="content">
    <a href="/genre/electronic">Electronic</a>
    ", "
    <a href="/genre/pop">Pop</a>


And here's my Python code

genre = None
try:
  genre = driver.find_element_by_xpath("[contains(concat(' ', @class, ' '), ' profile ')]//*[contains(@href, ' /genre/* '").text

How do I extract genres to text? (e.g. Electronic, Pop)


Solution

  • To extract and print the values of Genre i.e. Electronic, Pop, etc within the website you need to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

    • Using XPATH:

      driver.get("https://www.discogs.com/master/1515454-Zedd-Katy-Perry-365")
      WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click()
      print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//tr/th[@scope='row' and contains(., 'Genre')]//following::td[1]//a")))])
      
    • Console Output:

      ['Electronic', 'Pop']
      
    • Note : You have to add the following imports :

      from selenium.webdriver.support.ui import WebDriverWait
      from selenium.webdriver.common.by import By
      from selenium.webdriver.support import expected_conditions as EC