Is there any solution to get a link from the HTML, which has a tag and a div tag?
html1:
<a href="https://u50.ct.sendgrid.net/ls" target="_blank">
<div class="subtitle">
Service request #2226754
</div></a>
html2:
<div class="subtitle">
Service request <a href="https://u5024.ct.sendgrid.net/ls" style="color:#5A88AA; text-decoration:underline;" target="_blank">#2604467</a>
</div>
code:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
scores_string = soup.find("div",text=re.compile(re.compile('Service request',re.IGNORECASE)))
print(scores_string)
ahref = scores_string.find_parent("a")
print(ahref["href"])
Required solutions:
1)https://u50.ct.sendgrid.net/ls
2)https://u5024.ct.sendgrid.net/ls
I have two HTMLs. Both format are different. I need to take URL from both HTML. Is there any solution using beautifulsoup?
Implementing a custom tag filter. My solution doesn't need an extra import for _regex_s but for more complex cases it may be required or suggested.
def f(tag):
text = 'Service request'.casefold()
if tag.name == "a" and 'href' in tag.attrs:
for child_tag in tag.children:
if child_tag.name == 'div' and child_tag.get_text(strip=True).casefold().startswith(text):
return True
if tag.name == 'div' and tag.get_text(strip=True).casefold().startswith(text):
for child_tag in tag.children:
if child_tag.name == "a" and 'href' in child_tag.attrs:
return True
# matches
for m in soup.find_all(f):
# "destrucring"
if m.name != 'a':
m = m.a
print(m['href'])