I'm trying to write a Python script to scrape the hyperlink destinations (i.e., the href values) for each one of the entries in this web of science search results:
I would like to do it using Selenium in combination with Microsoft Edge WebDriver. I already have Selenium and Edge WebDriver installed. I have had partial success using searches such as
elements = driver.find_elements(By.TAG_NAME,'a[data-ta="summary-record-title-link"]')
print(len(elements))
for element in elements:
print(element.get_attribute('href'))
But for some reason, I can only get the href values for the first 4 search results, when in fact there are 50 search results on the webpage. What am I doing wrong?
The answer was, as I later found out, that I needed to ask Selenium to scroll down. Since the webpage is dynamically loaded, content is only present once you scroll down. Alternatively, you can ask Selenium to change the zoom factor to make the elements visible.
To scroll down to the bottom of the screen:
# Scrolls down to the bottom of the page.
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
To change the zoom factor:
# Changes the zoom of the page to 10%.
driver2.execute_script("document.body.style.zoom='10%'")