I want to get to access href links. Although my HTML is a nested structure like the image below
I'am trying to do that using BeautifulSoup4, however I'am new to WebScrapping. The code I'm using is:
import requests
from bs4 import BeautifulSoup
import time
url = "https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/17368301/DA+API+-+Canais+de+Atendimento"
response = requests.get(url)
if response.status_code == 200:
soup = BeautifulSoup(response.text, 'html.parser')
page_body = soup.find_all('div', class_= '_1bsb1osq _19pkidpf _2hwx1wug _otyridpf _18u01wug')
for p in page_body:
print(p.find_all('a'))
else:
print(f"Failed to retrieve content. Status Code: {response.status_code}")
But, my print shows an empty list []
My doubt is: Is there a way to access this element directly?
The desired data is loaded dinamically and only Beautifulsoup can't grab data. So you can get data using either selenium or from API request. Here I apply selenium with beautifulsoup
and it's working fine now.
SCRIPT:
import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
# chrome to stay open
options.add_experimental_option("detach", True)
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
driver.get("https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/17368301/DA+API+-+Canais+de+Atendimento")
time.sleep(3)
soup = BeautifulSoup(driver.page_source, 'lxml')
page_body = soup.select('ul.childpages-macro.conf-macro.output-block li')
for p in page_body:
href = p.a.get('href')
print('https://openfinancebrasil.atlassian.net' + href)
OUTPUT:
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/223773060
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/297533441
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/297533461
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/297533518
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/297533542
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/297533567
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/17368404
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/17368427
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/17368487
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/17368514
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/17368537/v1.0.1+-+Canais+de+Atendimentos
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/17368560
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/17368587/v1.0.0-rc5.2+-+Canais+de+Atendimentos
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/17368610
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/223805833
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/223805853
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/223805910
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/223805934
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/266895490
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/266895510
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/266895567
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/266895591
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/282886145
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/282886165
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/282886222
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/282886246
https://openfinancebrasil.atlassian.net/wiki/spaces/OF/pages/283901953