Search code examples
htmlseleniumbeautifulsouphrefwebdriverwait

Find href on web page


I don't understand why the following will not work - I am looking for and trying to click this specific link:

<a href="#/documents/2077">

From the URL: https://species-registry.canada.ca/index-en.html#/documents?documentTypeId=18&sortBy=documentTypeSort&sortDirection=asc&pageSize=10&keywords=Victoria%27s%20Owl-clover

From a starting point of that URL I have tried a few things including the following:

Attempt #1

WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.PARTIAL_LINK_TEXT,"COSEWIC-Assessment-and-status-report")))

and

appraisal_html = driver.find_element_by_partial_link_text("COSEWIC-Assessment-and-status-report")

Attempt #2

soup = bs(req.text,'html.parser')
for link in soup.find_all('a'):`
print(link.get('href'))`

Among other things. Keeping in mind that this is a generalized search in the sense that the species name will change every time I make this search, everything else should remain similar.

The second attempt is straight from the beautiful soup documentation and finds a whole bunch of links like the ones under the menu tab etc but not the href I am looking for.

The first attempt for some reason just times out without finding the partial text I input. Maybe this is because that is the text on the page and not the href itself?

One solution I am not thinking of is to look for the bounding box within which the link is found first and then look for the link within the new smaller search area but I still don't know why I am unable to find the right link from the entire page.


Solution

  • A couple of things here:

    • COSEWIC-Assessment-and-status-report isn't the exact text, but it is COSEWIC Assessment and Status Report on the Victoria’s Owl-clover

    • The text is not within the A tag but within a SPAN:

      <span data-v-7ee3c58f="" class="name-primary">COSEWIC Assessment and Status Report on the Victoria’s Owl-clover <em>Castilleja victoriae</em> in Canada</span>    
      

    So to identify the clickable element you need to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following Locator Strategies:

    • Using XPATH:

      driver.get("https://species-registry.canada.ca/index-en.html#/documents?documentTypeId=18&sortBy=documentTypeSort&sortDirection=asc&pageSize=10&keywords=Victoria%27s%20Owl-clover")
      WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH,"//span[contains(., 'COSEWIC Assessment and Status Report on the Victoria’s Owl-clover')]"))).click()
      
    • Note: You have to add the following imports :

      from selenium.webdriver.support.ui import WebDriverWait
      from selenium.webdriver.common.by import By
      from selenium.webdriver.support import expected_conditions as EC