Search code examples
pythonhtmlseleniumhref

Python Selenium - Get Link from Within a Class


I am attempting to scrape the href from the following HTML, but I need the second data class to identify the href:

<tr>
<td class="data">
    <a target="_new" title="Title" href="https://somesite.com/file_to_scrape.pdf">Scraped Class</a>
<br>
</td>
<td class="data">Text to Identify Above Link</td>
<td class="data">Not relevant text</td>
</tr>

The first thing I do is pull back a list of all classes that are named data:

ls_class = driver.find_elements_by_class_name("data")

but when I loop through:

for clas in ls_class:
   print(clas.text)
   print(clas.get_attribute('href'))

The print out is:

Scraped Class
None
Text to Identify Above Link
None
Not Relevant Text
None

How can I get the nested href when present in a data class?


Solution

  • I got it to work using a solution posted here:

     ls_class = driver.find_elements_by_xpath("//td[@class='data']")
    
     for clas in ls_class:
         print(clas.text)
         try:
             print(clas.find_element_by_css_selector('a').get_attribute('href'))
         except:
             print("No Link")
    

    Now my output is:

    Scraped Class
    https://somesite.com/file_to_scrape.pdf
    Text to Identify Above Link
    No Link
    Not Relevant Text
    No Link