Search code examples
pythonxmlopenxmlelementtree

Python, OOXML, ElementTree and document root attributes


ElementTree (Python 2.7) does not see the attributes of the root element, for example, for tag <w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"> - get an empty dictionary. I want "on the fly"to get the namespace for working with tags. Library xml.dom.minidom works fine, but I don't want to lose features with ET. Code example:

from xml.etree import ElementTree as ET
import zipfile
path = '/path/to/sample.docx'
zf = zipfile.ZipFile(path, 'r')
root = ET.fromstring(zf.read('word/document.xml'))
print(root.tag, root.attrib) # =>
# ('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}document', {})

Solution

  • An XML namespace declaration (a thing starting with xmlns:) is not an attribute. I think that's why you're not seeing it appear in the attrib dictionary. There are other ways of working with namespaces, so if you can say more about the purposes you're working to serve I may be able to be of more help.

    The namespaces (and their prefixes) of WordprocessingML elements are well known and documented, and relatively few in number. There are some tens at most and only a small handful that appear in most documents. So depending on what you're trying to accomplish it may be easier to get done than it might seem.