Search code examples
pythonxmlbeautifulsoupnamespacesxbrl

BeaitifulSoup can't read all the namespaces


I have an XBRL document, which should be an XML document.

I am trying to extract different tags grouped by their namespace. While the code appears to work with certain namespaces (us-gaap), it seems to fails for other ones (xbrli). However, in the xml file there are plenty of tags of type * < xbrli: ... >*

Code:

from bs4 import BeautifulSoup

with open('test.xml', 'r') as fp:
    raw_text = fp.read()

soup = BeautifulSoup(raw_text, 'xml')

print( len(soup.find_all(lambda tag: tag.prefix == 'us-gaap')) ) # print 941
print( len(soup.find_all(lambda tag: tag.prefix == 'xbrli')) ) # print 0

You can find the test.xml file here.


Solution

  • Using BeautifulSoup 4.8.1 solved the issue.