Search code examples
pythonseleniumselenium-webdriverweb-scrapingwebdriverwait

WebScrapping with Selenium and BeaufitulSoup can't find anything


I am trying to extract all the description in the links in the class="publication u-padding-xs-ver js-publication" of this website: https://www.sciencedirect.com/browse/journals-and-books?accessType=openAccess&accessType=containsOpenAccess

I tried both with BeautifulSoup and Selenium but I can't extract anything. You can see in the image below the result I got result

Here is the code I am using

options = Options()
options.add_argument("headless")
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
ul = driver.find_element(By.ID, "publication-list")
print("Links")
allLi = ul.find_elements(By.TAG_NAME, "li")
for li in allLi:
    print("Links " + str(count) + " " + li.text)

Solution

  • You are missing waits.
    You have to wait for elements to become visible before accessing them.
    The best approach to do that is with use of WebDriverWait expected_conditions explicit waits.
    The following code works

    from selenium import webdriver
    from selenium.webdriver.chrome.service import Service
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    options = Options()
    options.add_argument("start-maximized")
    
    webdriver_service = Service('C:\webdrivers\chromedriver.exe')
    driver = webdriver.Chrome(options=options, service=webdriver_service)
    wait = WebDriverWait(driver, 20)
    
    url = "https://www.sciencedirect.com/browse/journals-and-books?accessType=openAccess&accessType=containsOpenAccess"
    driver.get(url)
    ul = wait.until(EC.visibility_of_element_located((By.ID, "publication-list")))
    allLi = wait.until(EC.presence_of_all_elements_located((By.TAG_NAME, "li")))
    print(len(allLi))
    

    the output is:

    167