I have an XBRL document, which should be an XML document.
I am trying to extract different tags grouped by their namespace. While the code appears to work with certain namespaces (us-gaap), it seems to fails for other ones (xbrli). However, in the xml file there are plenty of tags of type * < xbrli: ... >*
Code:
from bs4 import BeautifulSoup
with open('test.xml', 'r') as fp:
raw_text = fp.read()
soup = BeautifulSoup(raw_text, 'xml')
print( len(soup.find_all(lambda tag: tag.prefix == 'us-gaap')) ) # print 941
print( len(soup.find_all(lambda tag: tag.prefix == 'xbrli')) ) # print 0
You can find the test.xml file here.
Using BeautifulSoup 4.8.1 solved the issue.