Search code examples
pythonselenium-webdriverweb-scraping

Trying to extract a text rendered by JavaScript using python gives empty output


I used to use a code to extract affiliation text from this page https://www.sciencedirect.com/science/article/abs/pii/S0011916424004600 you can find the the affiliation text after you click "Show more" at the top of the page.

However, now the code is giving me empty output for some reason.

This is the code that used to work:

import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By


service = Service(r"Z:\Private\hbasamh\ACWA Power\Files\Jupyter\Web Scraping\chromedriver.exe")
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
#options.add_argument("--headless=new")
#options.add_experimental_option("detach", True)
driver = webdriver.Chrome(service=service, options=options)
url = 'https://www.sciencedirect.com/science/article/abs/pii/S0011916424004600'
driver.get(url)
time.sleep(3)

driver.find_element(By.XPATH, '//span[@class="button-link-text" and contains(text(), "Show more")]').click()
time.sleep(2)

soup = BeautifulSoup(driver.page_source, "html.parser")

txt = [x.get_text().strip() for x in soup.select('[class="AuthorGroups text-s"] dl dd')]
print(txt)

driver.quit()

And this is the expected output:

a
College of Chemical Engineering, Zhejiang University of Technology, Hangzhou 310014, China
b
Natural Sciences and Science Education, National Institute of Education, Nanyang Technological University, Singapore 637616, Singapore
c
Department of Science Education, Rey Juan Carlos University, Madrid 28942, Spain

Can anyone please let me know what is wrong?


Solution

  • Looks like there was an update to the website's HTML.

    I don't see any class with the name AuthorGroups text-s. It should be AuthorGroups.

    enter image description here

    Code should be:

    txt = [x.get_text().strip() for x in soup.select('[class="AuthorGroups"] dl dd')]