I have already extracted data from a webpage but i cannot able to extract data from webpage which does not have unique identifier
I have already tried to extract data from a webpage which has unique identifiers like class ,span ,id but what to do when the page doesn't have unique identifier
url="https://dblp.org/"
r=requests.get(url)
print(r.content)
b=BeautifulSoup(r.text,"html.parser")
print(b.prettify())
a=b.find_all('ul',{"id":"browsable"}) #no id is available
It actually shows None where the expected results should be a list of links available
You can use type selector for a
tags within li
elements. Using the body
parent tag as an example, you can then get the li
elements child a
href
s with the following:
import requests
from bs4 import BeautifulSoup
url = 'https://dblp.org/'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
links = [item['href'] for item in soup.select('body li a')]
print(links)
If must have parent ul
tag then:
body ul li a
Worth noting two of the script tags in particular also contain a json structure with links available depending on your needs.