I am attempting to scrape the href
from the following HTML, but I need the second data class to identify the href
:
<tr>
<td class="data">
<a target="_new" title="Title" href="https://somesite.com/file_to_scrape.pdf">Scraped Class</a>
<br>
</td>
<td class="data">Text to Identify Above Link</td>
<td class="data">Not relevant text</td>
</tr>
The first thing I do is pull back a list of all classes that are named data
:
ls_class = driver.find_elements_by_class_name("data")
but when I loop through:
for clas in ls_class:
print(clas.text)
print(clas.get_attribute('href'))
The print out is:
Scraped Class
None
Text to Identify Above Link
None
Not Relevant Text
None
How can I get the nested href
when present in a data
class?
I got it to work using a solution posted here:
ls_class = driver.find_elements_by_xpath("//td[@class='data']")
for clas in ls_class:
print(clas.text)
try:
print(clas.find_element_by_css_selector('a').get_attribute('href'))
except:
print("No Link")
Now my output is:
Scraped Class
https://somesite.com/file_to_scrape.pdf
Text to Identify Above Link
No Link
Not Relevant Text
No Link