I am trying to scrape the Abstract of the research paper on IEEE Xplore website, link :. For this I used urllib library and Beautifulsoup in Python(3.10.9). Below is the code i have used:
`
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
url = 'https://ieeexplore.ieee.org/document/8480057'
headers = {"User-Agent":'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'}
req = Request(url, headers=headers)
response = urlopen(req, timeout=10)
html_text = response.read()
soup = BeautifulSoup(html_text, "html.parser")
# Find the element containing the abstract
abstract_element = soup.find("div", class_="u-pb-1")
# Extract the text from the abstract element
abstract = abstract_element.text.strip()
# Print the abstract
print(abstract)
`
Here i have attached the screenshot of html part having Abstract.
I am getting AttributeError: 'NoneType' object has no attribute 'text' for abstract.
I got the value of soup. But I don't know how to get the Abstract text. I am new to web scraping. I have tried a lot but didn't able to solve it. Please help me to solve this problem. Thanks.
In this case it's actually possible to get the abstract without the need for Selenium. I generally use the requests
library and css selectors in BeautifulSoup and that's what I did here:
import requests
req = requests.get(url, headers=headers)
soup = BeautifulSoup(req.text, "html.parser")
and then simply:
print(soup.select_one('meta[property="og:description"]')['content'])
The output should be the abstract.