Search code examples
pythonseleniumxpathcss-selectorswebdriverwait

Unable to click Next button using selenium as number of pages are unknown


I am new to selenium and trying to scrape:-

https://www.asklaila.com/search/Delhi-NCR/-/book-distributor/

I need all the details mentioned on this page an others as well.

Also, there are certain more pages containing the same information, need to scrape them as well. I try to scrape by making changes to the target URL:-

https://www.asklaila.com/search/Delhi-NCR/-/book-distributor/40

but the last item is changing and is not even similar to the page number. Page number 3 is having 40 at the end and page number 5:-

https://www.asklaila.com/search/Delhi-NCR/-/book-distributor/80

so not able to get the data through that.

Here is my code:-

def extract_url():
    url = driver.find_elements(By.XPATH,"//h2[@class='resultTitle']//a")
    for i in url:
        dist.append(i.get_attribute("href"))
        
    driver.execute_script("window.scrollTo(0,document.body.scrollHeight)")
    
    driver.find_element(By.XPATH,"//li[@class='btnNextPre']//a").click()

for _ in range(10):
    extract_url()

working fine till page 5 but not after that. Could you please suggest how can I iterate over pages where the we don't know the number of pages and can extract data till teh last page.


Solution

  • You need the check the pagination link is disabled. Use infinite loop and check for pagination button is disabled.

    Use WebDriverWait() and wait for visibility of the element.

    Code:

    driver.get("https://www.asklaila.com/search/Delhi-NCR/-/book-distributor/")
    counter=1
    while(True):        
        WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"h2.resultTitle >a")))
        urllist=[item.get_attribute('href') for item in driver.find_elements(By.CSS_SELECTOR, "h2.resultTitle >a")]
        print(urllist)
        print("Page number :" +str(counter))    
        driver.execute_script("arguments[0].click();", driver.find_element(By.CSS_SELECTOR, "ul.pagination >li.btnNextPre>a"))    
        #check for pagination button disabled
        if len(driver.find_elements(By.XPATH, "//li[@class='disabled']//a[text()='>']"))>0:
            print("pagination not found!!!")
            break
        time.sleep(2) #To slowdown the loop
        counter=counter+1
    

    import below libraries.

    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    import time