I used to use a code to extract affiliation text from this page https://www.sciencedirect.com/science/article/abs/pii/S0011916424004600 you can find the the affiliation text after you click "Show more" at the top of the page.
However, now the code is giving me empty output for some reason.
This is the code that used to work:
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
service = Service(r"Z:\Private\hbasamh\ACWA Power\Files\Jupyter\Web Scraping\chromedriver.exe")
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
#options.add_argument("--headless=new")
#options.add_experimental_option("detach", True)
driver = webdriver.Chrome(service=service, options=options)
url = 'https://www.sciencedirect.com/science/article/abs/pii/S0011916424004600'
driver.get(url)
time.sleep(3)
driver.find_element(By.XPATH, '//span[@class="button-link-text" and contains(text(), "Show more")]').click()
time.sleep(2)
soup = BeautifulSoup(driver.page_source, "html.parser")
txt = [x.get_text().strip() for x in soup.select('[class="AuthorGroups text-s"] dl dd')]
print(txt)
driver.quit()
And this is the expected output:
a
College of Chemical Engineering, Zhejiang University of Technology, Hangzhou 310014, China
b
Natural Sciences and Science Education, National Institute of Education, Nanyang Technological University, Singapore 637616, Singapore
c
Department of Science Education, Rey Juan Carlos University, Madrid 28942, Spain
Can anyone please let me know what is wrong?
Looks like there was an update to the website's HTML.
I don't see any class with the name AuthorGroups text-s
. It should be AuthorGroups
.
Code should be:
txt = [x.get_text().strip() for x in soup.select('[class="AuthorGroups"] dl dd')]