python python-2.7 selenium-webdriver infinite-scroll html-content-extraction

extracting links with a specific class with Selenium in Python

I am trying to extract links from a infinite scroll website

It's my code for scrolling down the page

driver = webdriver.Chrome('C:\\Program Files     (x86)\\Google\\Chrome\\chromedriver.exe')
driver.get('http://seekingalpha.com/market-news/top-news')
for i in range(0,2):
    driver.implicitly_wait(15)
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(20)

I aim at extracting specific links from this page. With class = "market_current_title" and HTML like the following :

<a class="market_current_title" href="/news/3223955-dow-wraps-best-week-since-2011-s-and-p-strongest-week-since-2014" sasource="titles_mc_top_news" target="_self">Dow wraps up best week since 2011; S&amp;P in strongest week since 2014</a>

When I used

URL = driver.find_elements_by_class_name('market_current_title')

I ended up with the error that says "stale element reference: element is not attached to the page document". Then I tried

 URL = driver.find_elements_by_xpath("//div[@id='a']//a[@class='market_current_title']")

but it says that there is no such a link !!! Do you have any idea about solving this problem?

Solution

You're probably trying to interact with an element that is already changed (probably elements above your scrolling and off screen). Try this answer for some good options on how to overcome this.

Here's a snippet:

from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
import selenium.webdriver.support.expected_conditions as EC
import selenium.webdriver.support.ui as ui

# return True if element is visible within 2 seconds, otherwise False
def is_visible(self, locator, timeout=2):
try:
    ui.WebDriverWait(driver, timeout).until(EC.visibility_of_element_located((By.CSS_SELECTOR, locator)))
    return True
except TimeoutException:
    return False