I am working on web scraping 10Q documents from SEC edgar.
This is the url link: https://www.sec.gov/Archives/edgar/data/1652044/000165204419000032/goog10-qq32019.htm
I need to extract 1600 Amphitheatre Parkway without using id. Below is a code snippet to extract text using id tag. However I need to se name tag.
from requests_html import HTMLSession
from bs4 import BeautifulSoup
session = HTMLSession()
page = session.get('https://www.sec.gov/Archives/edgar/data/1652044/000165204419000032/goog10-qq32019.htm')
soup = BeautifulSoup(page.content, 'html.parser')
content = soup.find(id="d92517213e644-wk-Fact-0B11263160365DBABCF89969352EE602")
print(content.text)
Instead of id tag, I would like to use name tag. However I am not able to extract information sing name tag. Please help.
see the html information :
How to use name tag instead of id tag to extract the contents.
Thanks
You can find elements based on attribute values like this
soup.find('html_tag',{"attribute":"value"})
So in your case, name
attribute exists on ix:nonnumeric
tag
content = soup.find('ix:nonnumeric',{"name":"dei:EntityAddressAddressLine1"})