Search code examples
python-3.xseleniumweb-scrapinglist-comprehensionhref

Unable to retrieve the href attributes using Python and Selenium


I'm very new to this and have spent hours trying various methods I've read here. Apologies if I'm making some silly mistake

I want to create a database of my LEGO sets. Pulling images and info from brickset.com

I'm using:

anchors = driver.find_elements_by_xpath('//*[@id="ui-tabs-2"]/ul/li[1]/a')
anchors = [a.get_attribute('href') for a in anchors]

print (anchors) returns:

anchors = driver.find_elements_by_xpath('//*[@id="ui-tabs-2"]/ul/li[1]/a')

What I'm trying to target:

div id="ui-tabs-2" class="ui-tabs-panel ui-widget-content ui-corner-bottom" aria-live="polite" aria-labelledby="ui-id-4" role="tabpanel" aria-expanded="true" aria-hidden="false" style="display: block;">
<ul class="moreimages">
<li>
<a href="https://images.brickset.com/sets/AdditionalImages/21054-1/21054_alt10.jpg" class="highslide plain " onclick="return hs.expand(this)">
<img src="https://images.brickset.com/sets/AdditionalImages/21054-1/tn_21054_alt10_jpg.jpg" title="" onerror="this.src='/assets/images/spacer2.png'" loading="lazy">
</a><div class="highslide-caption">

I'm losing my mind trying to figure this out.

Update Still not getting the href attributes. To add more detail, I'm trying to get the images under the "images" tab on this URL: https://brickset.com/sets/21330-1/Home-Alone Here is the problematic code:

anchors = driver.find_elements(By.XPATH, '//*[@id="ui-tabs-2"]/ul/li/a')
links = [anchors.get_attribute('href') for a in anchors]
print('Found ' + str(len(anchors)) + ' links to images')

I've also tried:

#anchors = driver.find_elements_by_css_selector("a[href*='21330']")

This only returned one href, even though there should be about a dozen.

Thank you all for the assistance!


Solution

  • You shouldn't be using the same name for multiple variables.

    As per the first line of code:

    anchors = driver.find_elements_by_xpath('//*[@id="ui-tabs-2"]/ul/li[1]/a')
    

    anchors is the list of WebElements. Ideally to create another list with the href attributes you should use another name, e.g. hrefs

    Effectively your code block will be:

    anchors = driver.find_elements_by_xpath('//*[@id="ui-tabs-2"]/ul/li[1]/a')
    hrefs = [a.get_attribute('href') for a in anchors]
    print(hrefs)
    

    Using list comprehension in a single line:

    print(a.get_attribute('href') for a in driver.find_elements_by_xpath('//*[@id="ui-tabs-2"]/ul/li[1]/a'))