Search code examples
pythonselenium-webdriver

Selenium href for loop not completing


I've been stuck with this problem for a couple of weeks now, and have looked everywhere to try and solve it. I'm trying to scrap some information for the's sun website sports page. I can get the title of the boxes and an even the little description underneath it, but when I try and get the href the loop stops part way through and throws up an

error(selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":".//a"}).

This is the code that I have been running. I initally thought it was a element finding issue until I added the print(link) as I was only getting back the first link or just the error. I have tried chaging the way it is found eg by xpath, css or tag and so on but no luck either

import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

website = 'https://www.thesun.co.uk/sport/football/'
path = 'D:\\Programming\\Automate with Python\\Automating\\chromedriver_win32'
chrome_options = Options()
chrome_options.add_experimental_option('detach', True)

service = Service(executable_path=path)
browser = webdriver.Chrome(options=chrome_options)
browser.get(website)

containers = browser.find_elements(by="xpath", value='//div[@class="teaser__copy-container"]')

titles = []
sub_titles = []
links = []

for container in containers:
    title = container.find_element(By.CSS_SELECTOR, 'span').get_attribute("textContent")
    sub_title = container.find_element(By.CSS_SELECTOR, 'h3').get_attribute("textContent")
    link = container.find_element(By.XPATH, './/a').get_attribute("href")
    titles.append(title)
    sub_titles.append(sub_title)
    links.append(link)
    print(link)

df_headlines = pd.DataFrame({'title': titles, 'sub-title': sub_titles, 'links': links})
df_headlines.to_csv('headline.csv')

could it be a broken link on the website? Any help would be appreciated as this has been a bit of a challenge and I would like to solve it, thanks in advance.


Solution

  • It fails because one of the containers don't have "a" element. You can check on dev tools console

    $x("//div[@class='teaser__copy-container']") 
    

    returns 68 elements,

    $x("//div[@class='teaser__copy-container']//a")

    returns 67.

    To know what element is $x("//div[@class='teaser__copy-container' and not(.//a)])

    As I explained, change this line. This will force container have "a" element inside. Or try-except find_elements and set default value empty link

    containers = browser.find_elements(by="xpath", value='//div[@class="teaser__copy-container" and .//a]')