Search code examples
pythonseleniumcss-selectorswebdriverwaitjs-scrollintoview

Selenium crashes when I'm trying to parse the next page (and seven after it) on a website. Any way to tackle this?


I want to parse an IMDb film rating located here on around 8 pages. In order to do that I'm using Selenium, and I'm having trouble with clicks, proceeding algorithm to next page. In the end I need 1000 titles when I'll continue using BeautifulSoup. Code below isn't working, I need to use button 'NEXT' with this HTML:

<a class="flat-button lister-page-next next-page" href="/list/ls000004717/?page=2">
            Next
        </a>

This is the code:

from selenium import webdriver as wb
browser = wb.Chrome()
browser.get('https://www.imdb.com/list/ls000004717/')
field = browser.find_element_by_name("flat-button lister-page-next next-page").click()

Error is the following:

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".flat-button lister-page-next next-page"}
  (Session info: chrome=78.0.3904.108)

I suppose I lack knowledge of syntax needed, or maybe I mixed it up a little. I tried searching on SO, though every example is pretty unique and I don't possess the knowledge to extrapolate these cases fully. Any way Selenium can handle that?


Solution

  • There are a couple of ways that could work: 1. Use a selector for the next button and loop until the end:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as ec
    
    browser = webdriver.Chrome()
    browser.get('https://www.imdb.com/list/ls000004717/')
    selector = 'a[class*="next-page"]'
    
    num_pages = 10
    for page in range(pages):
    
        # Wait for the element to load
        WebDriverWait(browser, 10).until(ec.presence_of_element_located((By.CSS_SELECTOR, selector)))
        # ... Do rating parsing here
    
        browser.find_element_by_css_selector(selector).click()
    

    Instead of clicking on the element, the other option could be to navigate to the next page using broswer.get('...'):

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as ec
    
    # Set up browser as before and navigate to the page
    browser = webdriver.Chrome()
    browser.get('https://www.imdb.com/list/ls000004717/')
    selector = 'a[class*="next-page"]'
    base_url = 'https://www.imdb.com/list/ls000004717/'
    page_extension = '?page='
    
    # Already at page = 1, so only needs to loop 9 times
    for page in range(2, pages + 1):
        # Wait for the page to load
        WebDriverWait(browser, 10).until(ec.presence_of_element_located((By.CSS_SELECTOR, selector)))
        # ... Do rating parsing here
    
        next_page = base_url + page_extension + str(page)
        browser.get(next_page)
    

    As a note: field = browser.find_element_by_name("...").click() will not assign field to a webelement, as the click() method has no return value.