Search code examples
pythonpython-2.7selenium-webdriverinfinite-scrollhtml-content-extraction

extracting links with a specific class with Selenium in Python


I am trying to extract links from a infinite scroll website

It's my code for scrolling down the page

driver = webdriver.Chrome('C:\\Program Files     (x86)\\Google\\Chrome\\chromedriver.exe')
driver.get('http://seekingalpha.com/market-news/top-news')
for i in range(0,2):
    driver.implicitly_wait(15)
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(20)

I aim at extracting specific links from this page. With class = "market_current_title" and HTML like the following :

<a class="market_current_title" href="/news/3223955-dow-wraps-best-week-since-2011-s-and-p-strongest-week-since-2014" sasource="titles_mc_top_news" target="_self">Dow wraps up best week since 2011; S&amp;P in strongest week since 2014</a>

When I used

URL = driver.find_elements_by_class_name('market_current_title')

I ended up with the error that says "stale element reference: element is not attached to the page document". Then I tried

 URL = driver.find_elements_by_xpath("//div[@id='a']//a[@class='market_current_title']")

but it says that there is no such a link !!! Do you have any idea about solving this problem?


Solution

  • You're probably trying to interact with an element that is already changed (probably elements above your scrolling and off screen). Try this answer for some good options on how to overcome this.

    Here's a snippet:

    from selenium.common.exceptions import TimeoutException
    from selenium.webdriver.common.by import By
    import selenium.webdriver.support.expected_conditions as EC
    import selenium.webdriver.support.ui as ui
    
    # return True if element is visible within 2 seconds, otherwise False
    def is_visible(self, locator, timeout=2):
    try:
        ui.WebDriverWait(driver, timeout).until(EC.visibility_of_element_located((By.CSS_SELECTOR, locator)))
        return True
    except TimeoutException:
        return False